[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Problems with singularity version 3.8.1



I think problem with singularity 3.8.1 is now fixed in master branch

https://opensciencegrid.atlassian.net/browse/HTCONDOR-698
https://github.com/htcondor/htcondor/commit/c5883b706a04e06ac54d3bf40ec42d1c7eb36d87

(grr, today I found and fixed same bug...)

Petr

On 9/8/21 10:04 PM, Josh Willis wrote:

First, apologies. ÂI apparently wasnât actually subscribed to this list until yesterday, so I canât really âreplyâ to the thread I would like to. ÂBut the subject line is the thread Iâm trying to reply to.

Greg, we have either more data for Matthiasâs bug report, or a similar but subtly different problem. ÂPlease let us know if you want more information.

Starting about a week ago we had a user seeing her jobs held with messages such as this:

007 (42052443.000.000) 2021-09-01 10:30:09 Shadow exception!
Error fromÂslot1_4@xxxxxxxxxxxxxxxxxxxxxxxx: Singularity test failed:INFO: ÂÂÂCould not find any nv files on this host!
0 Â- ÂRun Bytes Sent By Job
2084 Â- ÂRun Bytes Received By Job

If I directly test the same singularity image from the command line, I see:

[joshua.willis@ldas-osg ~]$ singularity test --nv /home/rebecca.ewing/observing/4/dev/builds/gstlal_dev-082721 ; echo $?
INFO: ÂÂÂCould not find any nv files on this host!
INFO: ÂÂÂNo test script found in container, exiting
No test found in container, executing /bin/sh -c true
0

That is, an additional warning line, but the error code of the test is actually still zero.

If I omit the â--nvâ I donât get the message about not finding nv files (unsurprisingly).

We think that last point might be relevant because James can, with his standard test jobs, reproduce the error at CIT when submitting from either HTCondor 9.0.4 or 9.0.5, and singularity 3.8.1. ÂHowever those same jobs succeed when they come into CIT from OSG, even though the version of singularity is the same. ÂSo we suspect that maybe âsingularity testâ is not always invoked with âânvâ, but perhaps itâs something else.

If you can confirm that this is the same problem Matthias saw, then we will happily await the patch for testing. ÂOtherwise we wanted to alert you that there may be a different but similar problem.

Cheers,

Josh

--
Josh Willis
jlwillis@xxxxxxxxxxx

Computational Scientist
Caltech/LIGO


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/