[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor 8.8.2 Released

Dear HTCondor team,

thanks a lot for the release!

I have tested it just now, and as expected, there are still issues / regressions compared to 8.6 with condor_ssh_to_job with Singularity remaining, namely those two:

- Interactive Jobs do not work. 
  Reason: /usr/libexec/condor/condor_ssh_to_job_shell_setup kills the initial sleep process thus cancelling the job. 
          Patching that out kills the job after the sleep is finished. 
          This is caused by nsenter and sshd now running outside the container (which is very good!).
          To solve the issue, some kind of "inhibit" process in the container
          would now be needed to keep it alive during the life time of nsenter / sshd. 

- The (shell) environment one ends up with when using condor_ssh_to_job is very different from what the job ends up in. 

  An SL6 container containing some generic commands in /etc/profile will explode with:
  -sh: cannot set terminal process group (-1): Inappropriate ioctl for device
  -sh: no job control in this shell
  -sh: cat: command not found
  -sh: uname: command not found
  -sh: [: =: unary operator expected
  -sh: ls: command not found
  -sh: uname: command not found
  -sh: uname: command not found
  -sh: grep: command not found
  -sh: grep: command not found
  -sh: grep: command not found
  After applying:
  STARTER_JOB_ENVIRONMENT="SHELL=/bin/bash PATH=/usr/sue/bin:/usr/local/bin:/bin:/usr/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin"
  as a workaround to better match what "singularity shell" defaults to and what is used as environment when using "singularity exec" (i.e. in a job),
  things look better, but some issues remain:
  -bash: cannot set terminal process group (-1): Inappropriate ioctl for device
  -bash: no job control in this shell
  Also, pressing Ctrl+C exits the interactive session immediately (i.e. signal handling is strange). 

So things have certainly improved a lot, but it's still not enough to let us move away from 8.6, even though I really hate to have sshd running inside the container
and love the new improved approach. 

Many thanks for the continuing improvements!


Am 12.04.19 um 00:26 schrieb Tim Theisen:
> The HTCondor team is pleased to announce the release of HTCondor 8.8.2.
> A stable series release contains significant bug fixes.
> Highlights of this release are:
> - Fixed problems with condor_ssh_to_job and Singularity jobs
> - Fixed a problem that could cause condor_annex to crash
> - Fixed a problem where the job queue would very rarely be corrupted
> - condor_userprio can report concurrency limits again
> - Fixed the GPU discovery and monitoring code to map GPUs in the same way
> - Made the CHIRP_DELAYED_UPDATE_PREFIX configuration knob work again
> - Fixed restarting HTCondor from the Service Control Manager on Windows
> - Fixed a problem where local universe jobs could not use condor_submit
> - Restored a deprecated Python interface that is used to read the event log
> - Fixed a problem where condor_shadow reuse could confuse DAGMan
> More details about the fixes can be found in the Version History:
> http://htcondor.org/manual/v8.8.2/StableReleaseSeries88.html
> Downloads Page:
> http://www.cs.wisc.edu/htcondor/downloads/
> Thank you for your interest in HTCondor!
> - The HTCondor Team
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature