[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] looking for a couple of knobs related condor_q and condor_history



On 7/11/2017 8:57 AM, Jose Caballero wrote:
Hi,

(1) does anyone know if there is a config variable to set for how long
a REMOVED job stays in the output of condor_q before moving it to
condor_history?


By default, a removed job will only stay in the output of condor_q (with job status "X") until HTCondor can confirm that the job has been killed on the execute machine. Usually this is just a couple seconds, but could take a couple minutes if there are problems communicating with the execute node, or the schedd is very busy (i.e. say you just removed thousands of running jobs, the schedd will spread out contacting the execute nodes over a couple minutes instead of attempting thousands ).

Or are you saying you actually prefer the job to stay in condor_q for a specified amount of time? If so, "there's a knob for that". You can achieve this via the submit parameter "leave_in_queue" documented on the condor_submit man page.

(2) how do you control for how long jobs are in condor_history? I
guess the question translates to the size of some file/DB, right?


Yes, as Bob mentioned you can adjust this via the condor_config knob ENABLE_HISTORY_ROTATION, MAX_HISTORY_LOG (max size of each file), MAX_HISTORY_ROTATIONS (max number of file rotations) -- see the Manual index for details on these.

You can also tell HTCondor to place a copy of each historical job classad into a specified directory. This can be useful if you want to have some other service process every completed job ad, such as a script to upload the history files into a nosql database or some such. From section 3.5 of the Manual:

PER_JOB_HISTORY_DIR
If set to a directory writable by the HTCondor user, when a job leaves the condor_schedd's queue, a copy of the job's ClassAd will be written in that directory. The files are named history, with the job's cluster and process number appended. For example, job 35.2 will result in a file named history.35.2. HTCondor does not rotate or delete the files, so without an external entity to clean the directory, it can grow very large. This option defaults to being unset. When not set, no files are written.

Hope the above helps,
Todd