[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor 8.8.2 Released



Dear Greg,

I noticed https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=7018 just now - many thanks for that,
interesting idea to work around the timeout issue!

So it seems to me only the issue with:
   -bash: cannot set terminal process group (-1): Inappropriate ioctl for device
   -bash: no job control in this shell
remains - when people use the STARTER_JOB_ENVIRONMENT trick I described earlier (see below) to get the correct environment.

Do you have an idea how to fix this remaining issue?

Cheers, many thanks and all the best,
	Oliver


Am 12.04.19 um 00:55 schrieb Oliver Freyermuth:
Dear HTCondor team,

thanks a lot for the release!

I have tested it just now, and as expected, there are still issues / regressions compared to 8.6 with condor_ssh_to_job with Singularity remaining, namely those two:

- Interactive Jobs do not work.
   Reason: /usr/libexec/condor/condor_ssh_to_job_shell_setup kills the initial sleep process thus cancelling the job.
           Patching that out kills the job after the sleep is finished.
           This is caused by nsenter and sshd now running outside the container (which is very good!).
           To solve the issue, some kind of "inhibit" process in the container
           would now be needed to keep it alive during the life time of nsenter / sshd.

- The (shell) environment one ends up with when using condor_ssh_to_job is very different from what the job ends up in.

   An SL6 container containing some generic commands in /etc/profile will explode with:
   -------------------------------------------------------------------------------
   -sh: cannot set terminal process group (-1): Inappropriate ioctl for device
   -sh: no job control in this shell
   -sh: cat: command not found
   -sh: uname: command not found
   -sh: [: =: unary operator expected
   -sh: ls: command not found
   -sh: uname: command not found
   -sh: uname: command not found
   -sh: grep: command not found
   -sh: grep: command not found
   -sh: grep: command not found
   -sh-4.1$
   -------------------------------------------------------------------------------
   After applying:
   STARTER_JOB_ENVIRONMENT="SHELL=/bin/bash PATH=/usr/sue/bin:/usr/local/bin:/bin:/usr/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin"
   as a workaround to better match what "singularity shell" defaults to and what is used as environment when using "singularity exec" (i.e. in a job),
   things look better, but some issues remain:
   -------------------------------------------------------------------------------
   -bash: cannot set terminal process group (-1): Inappropriate ioctl for device
   -bash: no job control in this shell
   -------------------------------------------------------------------------------
   Also, pressing Ctrl+C exits the interactive session immediately (i.e. signal handling is strange).

So things have certainly improved a lot, but it's still not enough to let us move away from 8.6, even though I really hate to have sshd running inside the container
and love the new improved approach.

Many thanks for the continuing improvements!

Cheers,
	Oliver

Am 12.04.19 um 00:26 schrieb Tim Theisen:
The HTCondor team is pleased to announce the release of HTCondor 8.8.2.
A stable series release contains significant bug fixes.

Highlights of this release are:
- Fixed problems with condor_ssh_to_job and Singularity jobs
- Fixed a problem that could cause condor_annex to crash
- Fixed a problem where the job queue would very rarely be corrupted
- condor_userprio can report concurrency limits again
- Fixed the GPU discovery and monitoring code to map GPUs in the same way
- Made the CHIRP_DELAYED_UPDATE_PREFIX configuration knob work again
- Fixed restarting HTCondor from the Service Control Manager on Windows
- Fixed a problem where local universe jobs could not use condor_submit
- Restored a deprecated Python interface that is used to read the event log
- Fixed a problem where condor_shadow reuse could confuse DAGMan

More details about the fixes can be found in the Version History:
http://htcondor.org/manual/v8.8.2/StableReleaseSeries88.html

Downloads Page:
http://www.cs.wisc.edu/htcondor/downloads/

Thank you for your interest in HTCondor!

- The HTCondor Team


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/




--
Oliver Freyermuth
UniversitÃt Bonn
Physikalisches Institut, Raum 1.047
NuÃallee 12
53115 Bonn
--
Tel.: +49 228 73 2367
Fax:  +49 228 73 7869
--

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature