[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Interrupt job on execute node



On 02/02/2012 10:11 AM, Tomáš Nechutný wrote:
Hello,

I'd like to setup Condor in my company for running tests consisting of
multiple applications (via DAGMan). I'd like to setup Condor execute
nodes on computers of our employees. Some jobs in graph could possibly
take hour to complete. If and employee returns from lunch early he would
not be able to use his computer. Can kbdd daemon interrupt running job
so it can be started elsewhere?

Thank you.

http://research.cs.wisc.edu/condor/manual/v7.6/3_5Policy_Configuration.html

Setup a policy where WANT_SUSPEND becomes FALSE and PREEMPT becomes TRUE. Base it off KeyboardIdle, which the KBDD will keep current, and you'll have jobs getting kicked off user machines.

Be warned, this will produce what's called "badput" because if a job is 59 minutes into its 60 minute run the 59 minutes of work will be thrown away when the job restarts on another machine. To give a job a little more time (favor job over machine user), look into WANT_VACATE and RETIREMENTTIME options.

The default UWCS policies in /etc/condor/condor_config are reasonable examples of how to do this.

Best,


matt