[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Running long jobs



On Sat, Dec 03, 2005 at 07:01:43PM +0100, Daniel R Figueiredo wrote:
> 
> On Wed, 30 Nov 2005, Erik Paulson wrote:
> 
> Thanks for your message. It's now clear that I'll need support from the 
> Condor administrator. However, I looked through the report "Condor and The 
> Bolonga Batch System" as you suggested, but it was not clear how to 
> configurate Condor to run long jobs with preemption implemented via 
> suspension (as opposed to preemption via termination). In particular, I 
> would like to know what is the minimal set of configuration fields that 
> must be changed in order to achieve this? Recall that I would like for 
> long jobs to be preempted via suspension (as opposed to terminated through 
> a signal) and later resume from where they stopped (as opposed to 
> restarting from the beginning). Any ideas on how to this? I could then 
> suggest something concrete to our local Condor administrator.
> 

You need to create 2 VMs. There is no way to have one VM suspend a job, start
another one, and resume the first one later resume it later - if a job has 
state on a machine, it must have a VM watching over it, and a VM can only
watch over one job at a time.

You can emulate your desired behaviour with 2 VMs - the second VM can be 
configured to suspend the job whenever it sees the state of the first VM 
switch to "Claimed". The BBS document should give you all of the details you 
need.

-Erik