[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] windows xp log off kills jobs



On Dec 28, 2007 3:52 PM, Finch, Ralph <rfinch@xxxxxxxxxxxx> wrote:
> > What are the values of SUSPEND and PREEMPT on these machines.
>
> WANT_SUSPEND            = TRUE
> PREEMPT                 = FALSE
> PREEMPTION_REQUIREMENTS = FALSE
> KILL                            = FALSE
> # suspend job on VM1 if keyboard is touched
> # and VM2 has a Condor job or high load;
> # but don't suspend if job suspension time exceeds limit
> SUSPEND  = (VirtualMachineID == 1) \
>                 && ($(KeyboardBusy) ) \
>                 && ( (vm2_Activity == "Busy") || (vm2_LoadAvg >
> $(HighLoad)) ) \
>                 && ( ((TotalJobSuspendTime =!= UNDEFINED) &&
> (TotalJobSuspendTime <= $(MaxSuspendTime))) \
>                 || (TotalJobSuspendTime =?= UNDEFINED))
>
> > It is possible the standard 'kick a job off this machine if
> > the owner wants to use it' routines are kicking in.
> > You may wish to change that behaviour...
>
> We try to suspend jobs in our pool when interactive use is wanted with
> the above settings.  This has worked properly for a couple of years and
> works now; when keyboard activity happens the job on VM1 is suspended.
> Anyway, why would logging OFF a machine result in killing jobs even if
> we had SUSPEND and PREEMPT incorrect? :-(
>
> Ralph Finch
> 916-653-7552
>
>
>
> > -----Original Message-----
> > From: condor-users-bounces@xxxxxxxxxxx
> > [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Matt Hope
> > Sent: Friday, December 28, 2007 7:09 AM
> > To: Condor-Users Mail List
> > Subject: Re: [Condor-users] windows xp log off kills jobs
> >
> > On Dec 27, 2007 10:00 PM, Finch, Ralph <rfinch@xxxxxxxxxxxx> wrote:
> > > condor -version
> > > $CondorVersion: 6.8.3 Jan  5 2007 $
> > > $CondorPlatform: INTEL-WINNT50 $
> > >
> > > I am submitting jobs from machine1 to a pool, all windows xp.  If I
> > > then remote login to a machine running my jobs--say machine2--then
> > > logoff, the jobs on machine2 are killed and new jobs restart a few
> > > minutes later from the idle jobs in the pool.  Damn
> > annoying as you can guess.
> > >
> > > In this thread
> > >
> > https://lists.cs.wisc.edu/archive/condor-users/2004-November/msg00076.
> > > sh
> > > tml
> > >
> > > the poster had the same problem but seemed to think it was
> > only Java
> > > jobs.  Mine are not Java, my executable is a windows .bat
> > file which
> > > then runs a compiled exe.  He had a klugy solution to his Java jobs
> > > which I doubt would work with mine, plus it seems a serious
> > deficiency
> > > and should have a better solution.  I'm believing I'm not the first
> > > person to hit on this problem so is there a good solution?
> >
> > What are the values of SUSPEND and PREEMPT on these machines.

Hmm... Are you using RunAsOwner? If so does it happen if you run a job
and then someone else logs on then off?

Clutching at straws here...

Matt