[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_q hangs



condor_q can hang for a period of time when it queries the schedd and the schedd does not respond to the query.   I think the default timeout for this is 20 seconds, but it can be configured to be longer or shorter. 

The number of cores in the pool doesn't matter,  the number of jobs in the schedd might, but more likely the schedd is itself having a problem. 
We have seen situations where a non-responding DNS server can lock up the SCHEDD for a minute or two. 

condor_q can sometimes hang when trying to contact a DNS server in order to figure out how to contact the SCHEDD, but generally only happens when you run condor_q on a  different machine than the schedd.

You can look in the SchedLog on the  schedd to see if the schedd is receiving the command, it may also be clear from the SchedLog if the schedd is itself hanging for periods of time.  On the schedd you can configure

SCHEDD_DEBUG = $(SCHEDD_DEBUG) D_COMMAND

to insure that you will always get a message in the SchedLog when it receives a command from condor_q

-tj

-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Zhuo Zhang
Sent: Thursday, August 24, 2017 11:33 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] condor_q hangs

Hi,

We have a small HTCondor pool with 13 nodes (1 master and 12 working 
nodes) and each node has 24 cores. Cron jobs are set up on master node 
and each cron job is a script which launches several DAGMan jobs 
depending on different scenarios. But very often we see that there is no 
response from running condor_q when there are several hundreds of 
HTCondor jobs (each job requests one CPU) in the queue.

My question is what are the possible causes of condor_q hanging?

Thank you in advance,

Zhuo


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/