[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Jobs in idle state on a submitter blocks other jobs



Hi Mat,
thanks for your answer; the SchedLog has been erased since Tuesday. Next time I observe the phenomena, I will mail it. It has already happen twice.
Cheers,
Xavier

On 08/04/2021 22:23, MÃtyÃs Selmeci wrote:
Hi Xavier,

Without additional information, it's hard to say what was
happening. One execute node being down shouldn't cause jobs
to idle in the queue -- they would just match to one of the
other execute nodes (if they fit the job's requirements).

Can you post the job log from one of the stuck jobs somewhere?
Perhaps that will give us more information.

Thanks,
-Mat

On 4/7/21 5:00 AM, Xavier OUVRARD wrote:
Dear all,

since yesterday I had 6 jobs that were idle on a scheduler; one
computation node was faulty and I kept having attempt to connect to ...
in the SchedulLog; it seems then that it was blocking all the remaining
jobs that were kept in the condor_q. Rebooting the faulty node (not the
scheduler), allowed all the remaining jobs that were iddled to be run
again without any additional intervention.

Is it a normal behaviour?

The condor version is 8.8.13 on all machines.

Best regards,

Xavier
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


--
Xavier Ouvrard-Brunet
(RP Cluster administrator â HSE-RP-CS) @ CERN
Office 892/2A-12, Prevessins-MoÃns site
CERN, Esplanade des Particules, 1
CH-1211 Geneva 23
Mobile: +41 75 411 12 01
TÃl: +41 22 766 38 92
Personal research page:
www.infos-informatique.net