[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Cann't run singularity container via HTCondor job



Hi all,

I cann't run HTCondor job under singularity on execute host.

The job is submited but the error log on submit host says:
ERRORÂ : Home directory is not owned by calling user: /
ABORTÂ : Retval = 255

My configuration data:

execute node: singularity --version 2.6.0-HEAD.579c415, CentOS 7, Condor V8.6.12 submit node: singularity --version 2.6.0-HEAD.579c415, CentOS 6, Condor V8.6.12

On execute host I can see the output of singularity commands running manually not from root

$ singularity run /tmp/hello-world.simg
RaawwWWWWWRRRR!!

$ singularity exec /tmp/hello-world.simg cat /etc/os-release
NAME="Ubuntu"
VERSION="14.04.5 LTS, Trusty Tahr"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 14.04.5 LTS"
VERSION_ID="14.04"
HOME_URL="http://www.ubuntu.com/";
SUPPORT_URL="http://help.ubuntu.com/";
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/";

So the singularity runs well on the execute node.
But I cann't run simple singularity container 'hello-world' via HTCondor job.

Here are my startd configuration parameters for singularity ("User Request" variant as shown by Brian Bockelman) :

SINGULARITY = /usr/local/bin/singularity
SINGULARITY_JOB = !isUndefined(TARGET.SingularityImage)
SINGULARITY_IMAGE_EXPR = TARGET.SingularityImage

And submit file :
------------------------
Universe = vanilla
executable = singularity_hello.sh
requirements = (Machine == "execute node")

+SingularityImage = "/tmp/hello-world.simg"

should_transfer_files = IF_NEEDED
when_to_transfer_output = ON_EXIT

output = out
error = err
log = log

queue
------------------------------

Executable script :
-----------------------------

#!/bin/bash
date
echo "I'm process id $$ on" `hostname`
echo "This is sent to standard error" 1>&2
echo "Running as binary $0" "$@"

cat /etc/os-release

-----------------------------------

Having such a configuration I hoped to get the output of "singularity run /tmp/hello-world.simg" command as if it was typed on execute host shell, not inside the singularity container. Or at least the output of "cat /etc/os-release" command running inside the container.

But executable even not started as there is no output from 'echo' commands in 'out' file (it's empty) , mentioned in submit file.

Also here is the log extraction from execute host' StarterLog.slot1
...
09/27/18 13:53:53 (pid:19713) Job 109.0 set to execute immediately
09/27/18 13:53:53 (pid:19713) Starting a VANILLA universe job with ID: 109.0
09/27/18 13:53:53 (pid:19713) IWD: /var/lib/condor/execute/dir_19713
09/27/18 13:53:53 (pid:19713) Output file: /var/lib/condor/execute/dir_19713/_condor_stdout 09/27/18 13:53:53 (pid:19713) Error file: /var/lib/condor/execute/dir_19713/_condor_stderr
09/27/18 13:53:53 (pid:19713) Renice expr "0" evaluated to 0
09/27/18 13:53:53 (pid:19713) About to exec /var/lib/condor/execute/dir_19713/condor_exec.exe
09/27/18 13:53:53 (pid:19713) Running job via singularity.
09/27/18 13:53:54 (pid:19713) Create_Process succeeded, pid=19728
09/27/18 13:53:54 (pid:19713) Process exited, pid=19728, status=255
09/27/18 13:53:54 (pid:19713) Got SIGQUIT. Performing fast shutdown.
09/27/18 13:53:54 (pid:19713) ShutdownFast all jobs.
09/27/18 13:53:54 (pid:19713) **** condor_starter (condor_STARTER) pid 19713 EXITING WITH STATUS 0
-------------

Finally, the singularity installed and runs well on execute node.
HTCondor tries to run the job under singularity but there is some permission/owner issues that I don't understand.
Please direct me to relevant point.

Evgeny.