[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_ssh_to_job broken with 8.8 on CentOS 7



Dear Greg,

Am 26.02.19 um 18:31 schrieb Oliver Freyermuth:
So probably, this only fails for interactive jobs, since the sleep is reaped before we attach?
I can't test witha batch job right now since I am already in the middle of the downgrade (and we still lack a proper test setup), but I'll try.

Indeed that's the case. Replacing "-a" with "-m -p -u -U" via a wrapper (thanks Christoph!) makes attaching to running non-interactive batch jobs via condor_ssh_to_job
almost work.
However, attaching to the user namespace fails:
nsenter: reassociate to namespace 'ns/user' failed: Invalid argument

SElinux is not the cause, I am unsure what is going wrong here. Maybe it's caused by the old RHEL 7 kernel, or by nsenter being too old.

Since our users are already firing more and more jobs into the queue ignoring the announced maintenance, I will proceed with the rollback for now,
but I think this has uncovered at least a few issues:
- with interactive jobs, "sleep" is killed before nsenter has finished
- The argument "-a" to nsenter not being present on CentOS 7
- and somehow attaching to the user namespace failing on CentOS 7

Since I can reproduce the latter also without involving HTCondor (just calling nsenter on Singularity manually with the same parameters),
it should also be "debuggable" after the downgrade.

Cheers and many thanks for the quick responses!
	Oliver

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



Attachment: smime.p7s
Description: S/MIME Cryptographic Signature