[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Placing long jobs on hold



Ian Chesal wrote:
We would like to place jobs that run longer than 6 hours on hold. What
is the best way to do this?

AFAIK: Condor can't do this.

I really wish it could!

This is one of those LSF features that LSF-ites argue is invaluable whenever we have clashes of the Condor and LSF types here.

LSF allows you to suspend (locally on the machine) a process tree and then run another job in that spot and when that job completes, unsuspend and resume the other work. Works only on Linux to my knowledge and how well it works depends very much on the tool being suspended. Some tools release their licenses when suspended (that's good), some don't (that's bad and makes suspend nearly useless). Suspended jobs don't release their memory IIRC. But it's better than nothing.

Where they argue this matters is with really expensive tools (think >$500,000 a license). You've got low priority, long running jobs using this expensive tool and now you need to do some really important, but quick, work. This can be done in in LSF: you suspend the long running jobs with an LSF command call and now LSF can run another job in the slot.

If I understand what you want, you can configure condor to do something similar with suspension slots: http://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToSuspendJobs

Otherwise, you might be able to use COD, though cod jobs have some restrictions.

-Greg