[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Jobs stay in running state after PID exits



I wouldnât expect multiplexing to occur, since HTCondor sets up a separate custom sshd for each ssh_to_job request. We should look into preventing this, as it causes surprising and undesired results.

 - Jaime

On Oct 18, 2016, at 4:55 PM, Jon Bernard <jonbernard@xxxxxxxxx> wrote:

Thanks - I should have read the man page first.

However, it looks like the user had actually logged out from the session for this job, but was logged in to the same node for another job. The sessions were multiplexed using ssh's ControlMaster, and this seems to have been the issue. When he disabled it, the problem stopped.


On Mon, Oct 17, 2016 at 5:03 PM, Jaime Frey <jfrey@xxxxxxxxxxx> wrote:
> On Oct 17, 2016, at 10:54 AM, Jon Bernard <jonbernard@xxxxxxxxx> wrote:
>
> A user noticed that a few of his jobs stayed in the running state long after they should have finished. The logs showed that the PID of the job (as reported by condor_ssh_to_job) exited at 4:40 AM, but the job log shows the job running until 12:22 PM, when it was evicted by the user.
>
> This happened on a handful of jobs out of tens of thousands.

The starter log shows that the user ran condor_ssh_to_job at 03:28:42, before the job exited. A second condor_ssh_to_job was started at 12:21:49 (the one you quote in the email). The first ssh session remained active until the job was removed by the user.

If an ssh_to_job session is active when a job exits, HTCondor allows the ssh_to_job session to continue. During that time, the job remains in the running state in the job queue, and the user is charged with the machine usage in the accounting records.

Thanks and regards,
Jaime Frey
UW-Madison HTCondor Project

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/