[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] running executable that exists only in the container



Hi All,

I figured out the issue: with Singularity version 3.11, one needs `runc` installed to allow instances to run in OCI mode. But this isn't listed as an optional dependency when installing singularity-ce via EPEL.

Not sure how this OCI problem should show up in an error. Should the results of "Singularity Tests" be visible to job submitters? Should I just toss this up as an issue on the htcondor github/gitlab repo?

Cheers,
Matt

Matthew T. West
DevOps & HPC SysAdmin
University of Exeter, Research IT
http://www.exeter.ac.uk/research/researchcomputing/support/researchit
57 Laver Building, North Park Road, Exeter, EX4 4QE, United Kingdom

On 02/10/2023 17:06, Matthew T West via HTCondor-users wrote:
CAUTION: This email originated from outside of the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.


Hi Greg,

One step forward. Got rid of that particular shadow exception.

040 (008.000.000) 2023-10-02 17:01:17 Started transferring input files
ÂÂÂÂÂÂÂ Transferring to host:
<127.0.0.1:9618?addrs=127.0.0.1-9618&alias=minicondor&noUDP&sock=slot1_1449_b394_6627>
...
040 (008.000.000) 2023-10-02 17:01:17 Finished transferring input files
...
001 (008.000.000) 2023-10-02 17:01:18 Job executing on host:
<127.0.0.1:9618?addrs=127.0.0.1-9618&alias=minicondor&noUDP&sock=startd_989_f341>
ÂÂÂÂÂÂÂ SlotName: slot1@minicondor
ÂÂÂÂÂÂÂ CondorScratchDir = "/var/lib/condor/execute/dir_219876"
ÂÂÂÂÂÂÂ Cpus = 1
ÂÂÂÂÂÂÂ Disk = 12381279
ÂÂÂÂÂÂÂ Memory = 1918
...
007 (008.000.000) 2023-10-02 17:01:18 Shadow exception!
ÂÂÂÂÂÂÂ Error from slot1@minicondor: Singularity test failed:No
Singularity tests for contained app:
ÂÂÂÂÂÂÂ 0Â -Â Run Bytes Sent By Job
ÂÂÂÂÂÂÂ 83791872Â -Â Run Bytes Received By Job

What sort of Singularity tests should be running along with the
executable itself?

Matt

Matthew T. West
DevOps & HPC SysAdmin
University of Exeter, Research IT
http://www.exeter.ac.uk/research/researchcomputing/support/researchit
57 Laver Building, North Park Road, Exeter, EX4 4QE, United Kingdom

On 02/10/2023 16:53, Greg Thain via HTCondor-users wrote:
CAUTION: This email originated from outside of the organisation. Do
not click links or open attachments unless you recognise the sender
and know the content is safe.


On 10/2/23 08:37, Matthew T West via HTCondor-users wrote:
Hi Greg,

First, thanks for noting this flag! I figured there had to be
something like it.


The submit file is below:

universeÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = container
container_imageÂÂÂÂÂÂÂÂ = ./lolcow_latest.sif
transfer_executableÂÂÂ = false
executableÂÂÂÂÂ = cowsay
argumentsÂÂÂÂÂ = moo
outputÂÂÂÂÂÂÂÂÂÂÂÂ = job.out
errorÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = job.err
logÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = job.log
request_cpusÂÂ = 1
request_memory = 1024M
request_diskÂÂ = 1024M


Try forcing file transfer on, by setting

should_transfer_files = yes

when_to_transfer_out = on_exit


and see if that work.


-greg

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users


The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/