[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Half of the slots are remaining 'owner' on all machines

Could the load from the two condor jobs running on each machine be causing the system load to show higher, thus making the other two slots show that they have a high load, making not accept any jobs?


On 05/16/2013 12:07 PM, Rich Pieri wrote:
John (TJ) Knoeller wrote:
If you are running HTCONDOR 7.9.5 or later, you can run

condor_q -better-analyze -reverse -machine <slotname> <jobid>
And if not, run condor_status and check the system loads. If idle nodes
have loads of 1.0 or higher then check to see what processes are running
on those nodes and eating CPU. I recently had to go through my pool to
disable the avahi-daemon process. The Avahi daemon has a tendency to go
stupid and lock up a CPU core. Killing the daemons freed up about half
the nodes in my pool.