[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] timeout reading buffer
- Date: Thu, 2 Mar 2006 14:44:57 -0500
- From: Preston Smith <psmith@xxxxxxxxxx>
- Subject: Re: [Condor-users] timeout reading buffer
On Mar 1, 2006, at 1:12 PM, Maxim Kovgan wrote:
* Are you using host based firewalls ?
* Can you look at /var/log/messages too ?
Nothing syslogged besides gridftp connections.
* Are you using a good equipment (routers/switches) ?
Yea. All my condor gear is directly connected into a Cisco 6509
Cluster nodes are all on cisco 4948 leaf switches with 10 Gbit links
back to said core switch.
* What is the topology of your network ?
I suspect the problem is either with OS or network, anyway, not
This schedd has been humming along busily for weeks, right up until
it got to
about 3000 jobs queued up.
The problem goes away when I hold half or so of the jobs in this
Now, with a large chunk of the queue held, condor's negotiated and
hundreds of jobs like it should. I've got the queue drained by now,
though, just by
holding a big chunk, and periodically releasing 6-700 jobs..
So while I never really solved the problem, I've worked around it.
Preston Smith <psmith@xxxxxxxxxx>
Systems Research Engineer
Rosen Center for Advanced Computing, Purdue University