[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Bug: interactive jobs + custom job attributes + singularity



Dear HTCondor experts,

we are observing unexpected behaviour in the following situation
(inspired by
http://research.cs.wisc.edu/htcondor/manual/v8.6/3_17Singularity_Support.html):

1. All jobs run in singularity containers (SINGULARITY_JOB = true)

2. Users can choose the desired OS using a custom job attribute
"+DesiredOS". The relevant part of the used HTCondor configuration is:

-----------------------------------------------------------------------
DEFAULT_CENTOS7_IMAGE = /cvmfs/example.com/singularity/CentOS7/default

DEFAULT_SL6_IMAGE = /cvmfs/example.com/singularity/SL6/default

DEFAULT_UBUNTU1604_IMAGE = /cvmfs/example.com/singularity/Ubuntu1604/default

CHOSEN_IMAGE = ifThenElse(TARGET.DesiredOS is "Ubuntu1604",
"$(DEFAULT_UBUNTU1604_IMAGE)", ifThenElse(TARGET.DesiredOS is "CentOS7",
"$(DEFAULT_CENTOS7_IMAGE)", "$(DEFAULT_SL6_IMAGE)"))

SINGULARITY_IMAGE_EXPR = $(CHOSEN_IMAGE)
-----------------------------------------------------------------------

3. Users can start interactive jobs and should obtain the desired
runtime environment using

    condor_submit -i consel.jdl

where the contents of consel.jdl is

-----------------------------------------------------------------------
Universe   = vanilla
+DesiredOS = "Ubuntu1604"
Queue
-----------------------------------------------------------------------

Unfortunately this does not work. The users always end up in the default
container OS (SL6 in the above example) as if "DesiredOS" was not defined.

With non-interactive jobs the above configuration works as expected.

Checking the process tree on the execute node, the situation looks like
this:

-----------------------------------------------------------------------
[...]
condor    1676  0.0  0.0  98568  7680 ?        Ss   Feb25   0:07
/usr/sbin/condor_master -f
root      2640  0.1  0.0  28376  8100 ?        S    Feb25   6:16  \_
condor_procd -A /var/run/condor/procd_pipe -L /var/log/condor/ProcLog -R
1000000 -S 6
condor    2658  0.0  0.0  78628  6888 ?        Ss   Feb25   0:07  \_
condor_shared_port -f -p 9618
condor    2921  0.1  0.0  84240 10892 ?        Ss   Feb25   6:48  \_
condor_startd -f
condor   45979  0.3  0.0  88388  7916 ?        Ss   18:15   0:00      \_
condor_starter -f -a slot1_1 submit.example.com
user1    46001  0.0  0.0  19944   796 ?        SNs  18:15   0:00
 \_ /usr/libexec/singularity/bin/action-suid /bin/sleep 180
user1    46008  0.0  0.0   4360   356 ?        SN   18:15   0:00
 |   \_ /bin/sleep 180
user1    46022  0.0  0.0  19944   800 ?        SNs  18:15   0:00
 \_ /usr/libexec/singularity/bin/action-suid /usr/sbin/sshd -i -e -f
/pool/condor
user1    46029  0.0  0.0  70936  2636 ?        SN   18:15   0:00
     \_ sshd: user1 [priv]
user1    46031  0.0  0.0  70936  1212 ?        SN   18:15   0:00
         \_ sshd: user1@pts/0
user1    46032  0.5  0.0  15124  3360 pts/0    SNs+ 18:15   0:00
             \_ -/bin/bash
[...]
-----------------------------------------------------------------------

Obviously there are two different containers running: one running
"sleep" and the other one executing sshd. Checking the file descriptors
of the corresponding processes yields the following output:

-----------------------------------------------------------------------
# ls -l /proc/46001/fd
[...]
lr-x------. 1 root  root         64  1. MÃr 18:15 5 ->
/cvmfs/example.com/singularity/Ubuntu1604/default
[...]
# ls -l /proc/46022/fd
[...]
lr-x------. 1 root  root         64  1. MÃr 18:16 5 ->
/cvmfs/example.com/singularity/SL6/default
[...]
-----------------------------------------------------------------------

From this information, it is obvious that there are two surprising
phenomena:

1. There are *two* containers started.
2. The two containers use *different* images indicating that the
container running sshd ignores the custom job attribute "DesiredOS".

Is there a way to make interactive jobs with the possibility to choose
singularity images work?

Cheers, Peter

P. S.: Is there a reason why the following command does not work (it
would be very convenient):

$ condor_submit -i '+DesiredOS = "Ubuntu1604"'
condor_submit: invalid attribute name '+DesiredOS' for attrib=value
assigment

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature