[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Job was evicted



On Wednesday 16 January 2008 04:44 pm, Dan Bradley wrote:
> Have you set PREEMPTION_REQUIREMENTS=False?

I have done it yesterday and up to now everything works as expected. I will 
report if the problem will occur again.
>
> More info on disabling preemption:
>
> http://www.cs.wisc.edu/condor/manual/v6.8/3_5Startd_Policy.html#SECTION0045
>10500000000000000

Thanks for the link, up to now condor was just working and thats it. But since 
this year we usally have about 10000 Jobs per  day (runningtime 2min up to 
2h) on 200 nodes. And therefore it seems I have to understand condor better.

I think on long term we also have to suspend long running jobs.

By the way, is it possible to just suspend a job with condor and continue 
without restarting and without checkpointing in vanilla universe under linux?

Or is it planned to implement checkpointing also for dynamic linked programs 
under linux?

Harald
 

>
> --Dan
>
> Harald van Pee wrote:
> >Hi all,
> >
> >I have a strange problem with condor 6.8.5 on linux.
> >In general everything works fine, all nodes are present and heavy loaded.
> >On all nodes I have configured:
> >START = TRUE
> >PREEMPT = FALSE
> >SUSPEND	       = False
> >KILL	       = False
> >WANT_SUSPEND   = False
> >WANT_VACATE    = False
> >CLAIM_WORKLIFE = 120
> >
> >intended is, that a running job never will be evicted!
> >(This is because most of the jobs have to run in vanilla universe.)
> >But if an job finishes, another user get a chance to run his job.
> >
> >In general this seems to work, but today we have had an nice-users with
> > very high userprio (501851872).
> >
> >All of his running jobs were interrupted, but startet again and
> > interrupted again ...
> >I first don't believe that his jobs are interrupted by condor, but indeed
> >after I lower the userprio everything runs smooth.
> >
> >Now the question is: have I configured anything wrong, or are the
> > statements above not enough to get what I want?
> >Must I give also
> >VACATE = Fase?
> >
> >Or is this a condor problem?
> >Any suggestions?
> >
> >Regards
> >Harald
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/