[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor w/o shared filesystem



On Monday, July 18, 2011 at 9:14 PM, Rita wrote:
One feature we would like to see is having condor jobs running completely independent of the scheduler -- related to file systems.

Currently, the job log depends needs to be shared on the scheduler and execution hosts.
This is not true. Only scheduler-side processes require access to the job log. Perhaps I'm mis-understanding your request?

Condor will try to be smart about using shared filesystems if FILESYSTEM_DOMAIN is the same between the scheduler and the execute nodes. You can set this to $(FULL_HOSTNAME) on every machine to let Condor know that none of your machines share a filesystem.
If the scheduler host dies for any reason the execution of the jobs halts.
Condor is highly fault tolerant and the disappearance of a scheduler can be tolerated. The default behaviour can be changed if you don't like it. Specifically you want to look at MAX_CLAIM_ALIVES_MISSED (http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#17538) and ALIVE_INTERVAL (http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#param:AliveInterval) -- these control how long a condor_startd will keep a job running while the scheduler they came from is no longer able to connect to the startd.

It would be nice to place all files in spool/ (this includes, output,error, and job (if possible)) . I am not sure if there is a better way. 
You can have this happen if you submit with the -remote option -- Condor will spool everything in the spool directory and hold it there for you until you fetch the files with condor_transfer_data.

Regards,
- Ian

---
Ian Chesal

Cycle Computing, LLC
Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools

http://www.cyclecomputing.com
http://www.cyclecloud.com
http://twitter.com/cyclecomputing