Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Stop Vanilla jobs from eviction/restart

Date: Thu, 20 Jun 2013 14:03:52 -0500
From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Stop Vanilla jobs from eviction/restart

On 6/20/2013 1:41 PM, Prem Kumar wrote:

hi Todd, thank you for your response.

i matched all of those settings in the link that you shared, and to my
surprise they are exactly the same what it needs to be to disable
preemption.

Did you remember to do a condor_reconfig -all (from a trusted machine,aka your central manager) when making the config file edits? Thecondor_config_val -dump is just reading from the config file, if thefile has been edited more recently than a reconfig...

Also, do you have the same config file setup on all nodes, or do youhave a different config file on your CM -vs- your execute nodes?


Could the job restarts have been from before you made the config changes?

Could the job restarts be a result of something outside of HTCondor'scontrol, such as reboot of an execute node or restart of the HTCondorservice?

Could the job restarts be a result of the jobs going on hold (for someerror reason like NFS server temporarily being down) and then released?


What version of HTCondor are you running?

If you are running v7.8 or earlier and you never want to interrupt arunning job, make certain of your central manager condor_config you have:

   PREEMPTION_REQUIREMENTS = False
and on all of your execute node condor_config you have:
   PREEMPT=FALSE
   KILL=FALSE
   RANK=0

If you are running HTCondor v8.0+ and you never want to interrupt arunning job, life can be simpler - I would suggest making certain allyour execute nodes condor_config have something like

  MAXJOBRETIREMENTTIME = 172800

which specifies how many seconds a job can run uninterrupted (172800 is2 days, set to whatever).

In the current developer series, we are adding to the startd classadinformation about how many times a job was interrupted by HTCondor -this will make it easier to confirm that the system is indeed doing whatyou think you are telling it :).


regards
Todd

Follow-Ups:
- Re: [HTCondor-users] Stop Vanilla jobs from eviction/restart
  - From: Prem Kumar

References:
- Re: [HTCondor-users] Stop Vanilla jobs from eviction/restart
  - From: Prem Kumar

Prev by Date: Re: [HTCondor-users] Stop Vanilla jobs from eviction/restart
Next by Date: Re: [HTCondor-users] Stop Vanilla jobs from eviction/restart
Previous by thread: Re: [HTCondor-users] Stop Vanilla jobs from eviction/restart
Next by thread: Re: [HTCondor-users] Stop Vanilla jobs from eviction/restart
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Stop Vanilla jobs from eviction/restart