[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor w/o shared filesystem



On Thursday, July 21, 2011 at 8:06 AM, Rita wrote:
sorry if i wasn't clear before.

I seen a post where you can stream your output and error. If you have streaming enabled and the scheduler host reboots the jobs restart from scratch. I would like to have steaming and have scheduler die without my jobs restarting from scratch. Is this possible? 
No. Not that I know of.

Streaming is accomplished through a connection between the startd and the shadow -- if you lose the shadow you break the stream and Condor can no longer ensure your output is being captured so it has to terminate the job. If it tried to tolerate the loss of the shadow in that case you'd have big gaps in your output stream where the shadow was offline.

If you need to tolerate scheduler outages the best configuration, in the absence of a shared file system, is to capture output locally and have Condor transfer it back to the scheduler when the job is complete.
Also for the -remote -name -spool what does 'name' mean? is it a scheduler name? 
Yes, -name <some machine in your pool running a schedd>. 

Regards,
- Ian

---
Ian Chesal

Cycle Computing, LLC
Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools

http://www.cyclecomputing.com
http://www.cyclecloud.com
http://twitter.com/cyclecomputing