[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Diagnosing the Queue



Hi Eric,

the 'remove' of 31k jobs comes at a price I guess, we do see similar things sometimes when a lot of 'single' jobs have state changes e.g. from idle to hold or removed the scheduler becomes kind of unresponsive to other tasks. You might want to put the scheduler db on a ssd device which makes these operations a lot faster or split the load from the scheduler on two different machines. 

Scripted 'condor_q' requests can be a nuisance too by the way ;)

Best
Christoph


--
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx


Von: "Eric Martin" <emmartin@xxxxxxxxxxxx>
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
Gesendet: Dienstag, 28. Mai 2019 19:04:59
Betreff: [HTCondor-users] Diagnosing the Queue

Running Condor 8.2.8 here, and am experiencing a lack of responsiveness when submitting jobs (this is mostly unusual), or running âcondor_qâ or âcondor_submit -debugâ.  âcondor_qâ does return after several minutes in some cases; in others it throws an error:

-- failed to fetch ads from: <our_scheduler_node_IP_address:51430>  :  <fqdn_of_same_scheduler_node>

 

This issue presumably started over the weekend when someone submitted a larger set of jobs (order of magnitude = 10x) than âusual.â  When âcondor_qâ does finish, at the end the summary shows the following:

31823 jobs; 0 completed, 31786 removed, 19 idle, 12 running, 24 held, 0 suspended

 

Iâm posting to see if anyone has insight into how to diagnose why the jobs arenât running.  I believe the amount (>33k jobs submitted over three days) isnât unprecedented.  Obviously Iâm not a Condor subject-matter expert here, but am trying to grow into something close, by hook or by crook.

 

Thanks for any and all insights!

 

Eric

 


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/