[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [Condor-users] schedd problems?
- Date: Thu, 24 Feb 2005 11:34:31 -0500
- From: "Ian Chesal" <ICHESAL@xxxxxxxxxx>
- Subject: RE: [Condor-users] schedd problems?
> I've got a strange problem (aren't they all?), and could use
> guidance on how to figure out what's wrong. I have a submit
> machine that can no longer tell what jobs are in it's own
> queue. I upgraded condor to 6.7.3 (from 6.6.7) on Feb 10;
> yesterday (Feb 23), it was noticed that condor_q would return:
> -- Failed to fetch ads from: <220.127.116.11:38456> :
> SchedLog doesn't seem to show anything interesting...
> How can I debug what's failing?
We've seen similar messages when a single schedd instance has LOTS of
ports open in the 6.7.3 builds. Can you check the number of open network
connections on the machine? Is the schedd currently preempting a lot of
startd machines in your cluster?