[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] return code of jupyter notebook jobs
- Date: Fri, 27 Mar 2020 11:05:13 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] return code of jupyter notebook jobs
On 3/27/2020 5:25 AM, Beyer, Christoph wrote:
as we use jupyter notebooks running in condor slots in production for a while now we need to get a bit of monitoring around this.
One of the bigger problems to come up with something decent is that the jupyterhub uses condor_rm to end the notebook once it is not needed anymore. This results in a condor_history entry with jobstatus == 3 which is considered to be a faulted job (which in fact in this case it is not). The other option is that the notebook job runs into the timelimit and gets removed by the periodic_remove_expression which is a bit more flexible to tweak presumably.
I would like the idea of having an option for condor_rm to influence the subsequent history-job-state.
I think your idea, whereby condor_rm can influence subsequent history-job-state, is on target. Please note that
condor_rm takes a "-reason <string>" argument, which allows you to set the RemoveReason job attribute at the time of
removal. This RemoveReason attribute will also be in the history. The Python API also supports setting a removal reason
at the time of job removal.
Does this help?