[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Forcing a job classad from config file?

The MaxJobRetirementTime setting in the config file controls how long the startd (i.e. execution machine) will let the job run when its claim is being retired. So that setting of MaxJobRetirementTime refers to the machine policy and it goes in the machine ClassAd, not the job ClassAd.

There is a more obscure case where you may want to set MaxJobRetirementTime in the job ClassAd. Doing this allows you to specify a _shorter_ retirement time than the one granted by the machine policy. By default, standard universe and nice-user jobs have their MaxJobRetirementTime=0, so they don't wait around in retirement. In all other cases, the default is to not define MaxJobRetirementTime in the job ClassAd, so the job will use the maximum amount of retirement time granted by the machine.

So from your post, I assume that you want MaxJobRetirementTime to be non-zero for either standard universe or nice-user jobs. In all other cases it should already be working. Is this correct?

I have verified that using SUBMIT_EXPRS to set the default MaxJobRetirementTime in the job ClassAd does not work for standard universe and nice-user jobs, because this is getting overwritten to 0. Another problem is that you can't independently set the machine and job attributes, since they both have the same name. I'll think about this and try to provide a solution.

One workaround that may or may not be useful to you until a fix becomes available is to use condor_submit -a MaxJobRetirementTime=X.

--Dan Bradley

Doak Bane wrote:

What I want is to force all job classads to (by default) take on the value for MaxJobRetirementTime as defined in a config file. Just defining a value in the config file does not pass any value to job classads. Jobs just get truly preempted, with no chance to retire, and restart later. I also tried this:
MaxJobRetirementTime = 3600
SUBMIT_EXPRS = MaxJobRetirementTime

With, or without, the SUBMIT_EXPRS all job classads still show:
   MaxJobRetirementTime = 0

Of course, if MaxJobRetirementTime is explicitly defined in the submit command file then things work correctly and jobs retire as expected.

Is there a way to make this work besides trickery with wrappers or changing all submit files?

Doak Bane
Condor-users mailing list