[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Job Attributes and Job Policy Expressions



Hi All
 
Is anyone aware of anything documenting job attributes, particularly in
relation to what attributes are available at what times? e.g. JobStartDate
obviously won't appear until a job has transitioned from idle to running.
 
It is possible to use "condor_q -l" to see a job's attributes but I was hoping
for a listing of ALL possible attributes and when they are "available".
 
The reason being that I have been fiddling with some job policy expressions
to "overcome" some issues we have on occasion when submitting jobs.
e.g. some jobs exiting too early and some seeming to run forever. If we
manually resubmit the "too early" jobs then they seem to mostly run OK.
Manually putting the "run forever" jobs on hold and then manually releasing
them also causes them to mostly run OK. This can be a labourious
process with 10,000+ submitted jobs, so we were looking at a way to make
this happen automatically using on_exit_remove, periodic_hold, etc.
 
I now have something that seems to work for us but it was a bit of a trial and
error process as some of the existing docs/examples don't seem to work?
(as the attribute doesn't exist, i.e. is not defined) and even some of the attributes
seen with "condor_q -l" give "undefined" errors.
 
e.g. the docs/example give one like:
 
== False) && (ExitSignal != 0)) || (ServerStartTime - JobStartdate < 3600 )
 
As far as I can tell there is no ServerStartTime, there is however a ServerTime
but even reference to that says it is undefined, yet I can see it with condor_q -l
 
BTW this is for windows version 7.2.4
 
Our trial and error solution gave us the following, which seems to work
OK for our purposes. This particular test setup is for jobs that should run
for 20 minutes, any less than this or more than this by 5 mins means
something dodgy has happened so we want to try re-running the job.
 

MINUTE = 60

- JobCurrentStartDate) > (15 * $(MINUTE))

periodic_hold = (CurrentTime - JobCurrentStartDate) > (30 * $(MINUTE))

periodic_release = (CurrentTime - EnteredCurrentStatus) > (5 * $(MINUTE))

 

Thanks for any help

Cheers

Greg