Without additional information, it's hard to say what was
happening.Â One execute node being down shouldn't cause jobs
to idle in the queue -- they would just match to one of the
other execute nodes (if they fit the job's requirements).
Can you post the job log from one of the stuck jobs somewhere?
Perhaps that will give us more information.
On 4/7/21 5:00 AM, Xavier OUVRARD wrote:
since yesterday I had 6 jobs that were idle on a scheduler; one
computation node was faulty and I kept having attempt to connect to ...
in the SchedulLog; it seems then that it was blocking all the remaining
jobs that were kept in the condor_q. Rebooting the faulty node (not the
scheduler), allowed all the remaining jobs that were iddled to be run
again without any additional intervention.
Is it a normal behaviour?
The condor version is 8.8.13 on all machines.
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
You can also unsubscribe by visiting
The archives can be found at: