After more testing this is what I found so far about this strange behavior:
Summary of tests:
Docker Universe
is available.
Docker images that are accessed with a single word such as found on
hub.docker.com
are running OK.
Previous .sub files that used to work and no longer work all are images with a forward slash in their name found on
hub.docker.com
All of these remain in HOLD while before yesterday they used to work (with the same .sub file unchanged.)
They all have the same matching problem that I mentioned in the first message:
One exception was: gromacs/gromacs but this might be part of a more "official" naming convention.
I noted that for the Singularity jobs the image had to be labeled as:
"docker://ubuntu"
or
"docker://docker.io/ubuntu"
This makes me wonder if there is a definition, or environment variable specific for
Docker Hub address that needs to be updated, added, or triggered that goes beyond the "official" images that fit mostly in one word.
That is what makes more sense at the moment...
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of JEAN-YVES SGRO via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Sent: Tuesday, November 2, 2021 1:51 PM To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx> Cc: JEAN-YVES SGRO <jsgro@xxxxxxxx> Subject: [HTCondor-users] After complete reboot same jobs that worked yesterday now stay IDLE Greetings,
Yesterday there was a general building Power Outage and the HTCondor Cluster system was eventually rebooted.
Now the same jobs (same .sub) files that worked yesterday no longer work and stay IDLE.
I used the command condor_q -better-analyze 363334.0 # where 363334.0 is the Job number
to try to understand, but I can't figure out where the problem is really.
I can see the following:
I don't understand the machine's "own requirements" I did try also the extended command:
condor_q -better-analyze 363334.0 -reverse -machine slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
that provides a summary:
I find these 2 statements conflicting in their meaning...
The output for both commands is very long and rather cryptic.
These are on "Universe = Docker" and I tested simpler
.sub files that ran OK. Hence the Docker Universe is available.
The 2 .sub file I sent this morning to test are the same as yesterday.
What can have been changed from rebooting? Is there any way to find this information?
THanks
Jean-Yves
|