Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] looking for a couple of knobs related condor_q and condor_history
- Date: Tue, 11 Jul 2017 15:13:09 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] looking for a couple of knobs related condor_q and condor_history
On 7/11/2017 8:57 AM, Jose Caballero wrote:
Hi,
(1) does anyone know if there is a config variable to set for how long
a REMOVED job stays in the output of condor_q before moving it to
condor_history?
By default, a removed job will only stay in the output of condor_q (with
job status "X") until HTCondor can confirm that the job has been killed
on the execute machine. Usually this is just a couple seconds, but
could take a couple minutes if there are problems communicating with the
execute node, or the schedd is very busy (i.e. say you just removed
thousands of running jobs, the schedd will spread out contacting the
execute nodes over a couple minutes instead of attempting thousands ).
Or are you saying you actually prefer the job to stay in condor_q for a
specified amount of time? If so, "there's a knob for that". You can
achieve this via the submit parameter "leave_in_queue" documented on the
condor_submit man page.
(2) how do you control for how long jobs are in condor_history? I
guess the question translates to the size of some file/DB, right?
Yes, as Bob mentioned you can adjust this via the condor_config knob
ENABLE_HISTORY_ROTATION, MAX_HISTORY_LOG (max size of each file),
MAX_HISTORY_ROTATIONS (max number of file rotations) -- see the Manual
index for details on these.
You can also tell HTCondor to place a copy of each historical job
classad into a specified directory. This can be useful if you want to
have some other service process every completed job ad, such as a script
to upload the history files into a nosql database or some such. From
section 3.5 of the Manual:
PER_JOB_HISTORY_DIR
If set to a directory writable by the HTCondor user, when a job
leaves the condor_schedd's queue, a copy of the job's ClassAd will be
written in that directory. The files are named history, with the job's
cluster and process number appended. For example, job 35.2 will result
in a file named history.35.2. HTCondor does not rotate or delete the
files, so without an external entity to clean the directory, it can grow
very large. This option defaults to being unset. When not set, no files
are written.
Hope the above helps,
Todd