[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] schedd problems?



> Hi,
> I've got a strange problem (aren't they all?), and could use 
> guidance on how to figure out what's wrong.  I have a submit 
> machine that can no longer tell what jobs are in it's own 
> queue.  I upgraded condor to 6.7.3 (from 6.6.7) on Feb 10; 
> yesterday (Feb 23), it was noticed that condor_q would return:
> 
> -- Failed to fetch ads from: <129.89.201.232:38456> : 
> hydra.phys.uwm.edu
> 
> SchedLog doesn't seem to show anything interesting...
> 
> How can I debug what's failing?

Hi Paul,

We've seen similar messages when a single schedd instance has LOTS of
ports open in the 6.7.3 builds. Can you check the number of open network
connections on the machine? Is the schedd currently preempting a lot of
startd machines in your cluster?

- Ian