[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Placing long jobs on hold



> We would like to place jobs that run longer than 6 hours on hold. What
> is the best way to do this?

AFAIK: Condor can't do this.

I really wish it could!

This is one of those LSF features that LSF-ites argue is invaluable whenever we have clashes of the Condor and LSF types here.

LSF allows you to suspend (locally on the machine) a process tree and then run another job in that spot and when that job completes, unsuspend and resume the other work. Works only on Linux to my knowledge and how well it works depends very much on the tool being suspended. Some tools release their licenses when suspended (that's good), some don't (that's bad and makes suspend nearly useless). Suspended jobs don't release their memory IIRC. But it's better than nothing.

Where they argue this matters is with really expensive tools (think >$500,000 a license). You've got low priority, long running jobs using this expensive tool and now you need to do some really important, but quick, work. This can be done in in LSF: you suspend the long running jobs with an LSF command call and now LSF can run another job in the slot.

You could approximate this feature with Condor's Virtual Machine support: if you run jobs in VM containers you should be able to preempt them, which causes Condor to suspend the VM before the job is removed, and then resume them elsewhere without any loss in compute progress. But how well that works, I don't know. And of course it requires the additional layer of complexity of the VM container.

- Ian

Confidentiality Notice.
This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,  or copying  of this message, or any attachments, is strictly prohibited.  If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.