[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Running long jobs



I don't think Daniel needs two VMs; he simply wants his one job to
suspend for some reason, then resume when the "reason" no longer
applies.

Looking at his original post, Daniel said:

"The problem is that after the job has been running for some hours (say
10 hours) Condor decides to evict the job from the machine."

Why it gets evicted is not said, so we don't know the criteria for
suspending a job.  I'll assume keyboard activity. Then "the minimal set
of configuration fields that must be changed in order to achieve
[suspension instead of eviction]" is:

WANT_SUSPEND 		= TRUE
PREEMPT			= FALSE
PREEMPTION_REQUIREMENTS	= FALSE
KILL 				= FALSE

ContinueIdleTime		= 5 * $(MINUTE)
SUSPEND			= $(KeyboardBusy)
CONTINUE			= (KeyboardIdle > $(ContinueIdleTime))

Ralph Finch, P.E.
Dept. of Water Resources
Bay-Delta Office, Room 215-13
Sacramento, CA  95814
916-653-7552
rfinch@xxxxxxxxxxxx
 

> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx 
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Erik Paulson
> Sent: Saturday, December 03, 2005 11:39 AM
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] Running long jobs
> 
> On Sat, Dec 03, 2005 at 07:01:43PM +0100, Daniel R Figueiredo wrote:
> > 
> > On Wed, 30 Nov 2005, Erik Paulson wrote:
> > 
> > Thanks for your message. It's now clear that I'll need 
> support from the 
> > Condor administrator. However, I looked through the report 
> "Condor and The 
> > Bolonga Batch System" as you suggested, but it was not clear how to 
> > configurate Condor to run long jobs with preemption implemented via 
> > suspension (as opposed to preemption via termination). In 
> particular, I 
> > would like to know what is the minimal set of configuration 
> fields that 
> > must be changed in order to achieve this? Recall that I 
> would like for 
> > long jobs to be preempted via suspension (as opposed to 
> terminated through 
> > a signal) and later resume from where they stopped (as opposed to 
> > restarting from the beginning). Any ideas on how to this? I 
> could then 
> > suggest something concrete to our local Condor administrator.
> > 
> 
> You need to create 2 VMs. There is no way to have one VM 
> suspend a job, start
> another one, and resume the first one later resume it later - 
> if a job has 
> state on a machine, it must have a VM watching over it, and a 
> VM can only
> watch over one job at a time.
> 
> You can emulate your desired behaviour with 2 VMs - the 
> second VM can be 
> configured to suspend the job whenever it sees the state of 
> the first VM 
> switch to "Claimed". The BBS document should give you all of 
> the details you 
> need.
> 
> -Erik
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>