Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] timeout reading buffer

Date: Thu, 2 Mar 2006 14:44:57 -0500
From: Preston Smith <psmith@xxxxxxxxxx>
Subject: Re: [Condor-users] timeout reading buffer


On Mar 1, 2006, at 1:12 PM, Maxim Kovgan wrote:

Hi, Preston.

Qs:
* Are you using host based firewalls ?

No.

* Can you look at /var/log/messages too ?


 Nothing syslogged besides gridftp connections.

* Are you using a good equipment (routers/switches) ?

Yea. All my condor gear is directly connected into a Cisco 6509core switch.

 Cluster nodes are all on cisco 4948 leaf switches with 10 Gbit links
 back to said core switch.

* What is the topology of your network ?

 see above

I suspect the problem is either with OS or network, anyway, notcondor related.

This schedd has been humming along busily for weeks, right up untilit got to

 about 3000 jobs queued up.

The problem goes away when I hold half or so of the jobs in thisschedd.Now, with a large chunk of the queue held, condor's negotiated andstartedhundreds of jobs like it should. I've got the queue drained by now,though, just by

 holding a big chunk, and periodically releasing 6-700 jobs..

 So while I never really solved the problem, I've worked around it.

-Preston

--
Preston Smith  <psmith@xxxxxxxxxx>
Systems Research Engineer
Rosen Center for Advanced Computing, Purdue University

Follow-Ups:
- Re: [Condor-users] timeout reading buffer
  - From: Dan Bradley

References:
- Re: [Condor-users] timeout reading buffer
  - From: Jaime Frey
- Re: [Condor-users] timeout reading buffer
  - From: Preston Smith
- Re: [Condor-users] timeout reading buffer
  - From: Maxim Kovgan

Prev by Date: Re: [Condor-users] howto avoid that a job is being evicted
Next by Date: [Condor-users] Daisy chain condor-c
Previous by thread: Re: [Condor-users] timeout reading buffer
Next by thread: Re: [Condor-users] timeout reading buffer
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] timeout reading buffer