[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Jobs stay in running state after PID exits

> On Oct 17, 2016, at 10:54 AM, Jon Bernard <jonbernard@xxxxxxxxx> wrote:
> A user noticed that a few of his jobs stayed in the running state long after they should have finished. The logs showed that the PID of the job (as reported by condor_ssh_to_job) exited at 4:40 AM, but the job log shows the job running until 12:22 PM, when it was evicted by the user.
> This happened on a handful of jobs out of tens of thousands.

The starter log shows that the user ran condor_ssh_to_job at 03:28:42, before the job exited. A second condor_ssh_to_job was started at 12:21:49 (the one you quote in the email). The first ssh session remained active until the job was removed by the user.

If an ssh_to_job session is active when a job exits, HTCondor allows the ssh_to_job session to continue. During that time, the job remains in the running state in the job queue, and the user is charged with the machine usage in the accounting records.

Thanks and regards,
Jaime Frey
UW-Madison HTCondor Project