[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Job was evicted
- Date: Thu, 17 Jan 2008 10:45:34 +0100
- From: Harald van Pee <pee@xxxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Job was evicted
On Wednesday 16 January 2008 04:44 pm, Dan Bradley wrote:
> Have you set PREEMPTION_REQUIREMENTS=False?
I have done it yesterday and up to now everything works as expected. I will
report if the problem will occur again.
> More info on disabling preemption:
Thanks for the link, up to now condor was just working and thats it. But since
this year we usally have about 10000 Jobs per day (runningtime 2min up to
2h) on 200 nodes. And therefore it seems I have to understand condor better.
I think on long term we also have to suspend long running jobs.
By the way, is it possible to just suspend a job with condor and continue
without restarting and without checkpointing in vanilla universe under linux?
Or is it planned to implement checkpointing also for dynamic linked programs
> Harald van Pee wrote:
> >Hi all,
> >I have a strange problem with condor 6.8.5 on linux.
> >In general everything works fine, all nodes are present and heavy loaded.
> >On all nodes I have configured:
> >START = TRUE
> >PREEMPT = FALSE
> >SUSPEND = False
> >KILL = False
> >WANT_SUSPEND = False
> >WANT_VACATE = False
> >CLAIM_WORKLIFE = 120
> >intended is, that a running job never will be evicted!
> >(This is because most of the jobs have to run in vanilla universe.)
> >But if an job finishes, another user get a chance to run his job.
> >In general this seems to work, but today we have had an nice-users with
> > very high userprio (501851872).
> >All of his running jobs were interrupted, but startet again and
> > interrupted again ...
> >I first don't believe that his jobs are interrupted by condor, but indeed
> >after I lower the userprio everything runs smooth.
> >Now the question is: have I configured anything wrong, or are the
> > statements above not enough to get what I want?
> >Must I give also
> >VACATE = Fase?
> >Or is this a condor problem?
> >Any suggestions?
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> The archives can be found at: