Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] timeout reading buffer

Date: Fri, 17 Mar 2006 10:42:27 -0600
From: Dan Bradley <dan@xxxxxxxxxxxx>
Subject: Re: [Condor-users] timeout reading buffer

Preston,

Any word on the schedd scaling issues?

I just realized that I described the meaning ofSCHEDD_TIMEOUT_MULTIPLIER backwards from how it actually is. Thissetting increases the timeouts used by the schedd when communicatingwith others. In general <SUBSYS>_TIMEOUT_MULTIPLIER increases thenetwork timeouts used by a particular subsystem of Condor.

Therefore, if you are seeing timeouts in the shadow logs, you should trysetting SHADOW_TIMEOUT_MULTIPLIER to some integer value greater than 1.

Also, if your negotiator logs show evidence that the schedd is notrequesting claims in time for the next negotiation cycle, you may wantto increase NEGOTIATOR_CYCLE_DELAY. The log message that would indicatethis sort of problem is this:

3/6 10:14:15 Resource vm3@xxxxxxxxxxxx@<nnn.nnn.nnn.nnn:34558> was notclaimed by user@xxxxxxxxxxx - removing match


--Dan

Preston Smith wrote:

Dan,

I'd read about NEGOTIATOR_TIMEOUT and turned it up to 60, but itwasn't enough.Are there any formulas, so to speak, for setting a good value for iton a busy schedd?

Don't want to set it too high..

Didn't know about SCHEDD_TIMEOUT_MULTIPLIER, though, I'll try that,too..


Thanks,
-Preston

On Mar 2, 2006, at 3:21 PM, Dan Bradley wrote:

Preston,

I haven't looked at all of your reports in detail, but I'm guessingyou may need to adjust some of the following timeouts if the scheddis not responding quickly enough to queries:


NEGOTIATOR_TIMEOUT

Sets the timeout that the negotiator uses on its networkconnections to the condor_ schedd and condor_ startds. It isdefined in seconds and defaults to 30.


SCHEDD_TIMEOUT_MULTIPLIER

Set this to some integer (e.g. 2 or 10) to increase the timeoutsthat are used when communicating with the schedd.



--Dan

On Mar 2, 2006, at 1:44 PM, Preston Smith wrote:

On Mar 1, 2006, at 1:12 PM, Maxim Kovgan wrote:

Hi, Preston.

Qs:
* Are you using host based firewalls ?

No.

* Can you look at /var/log/messages too ?

 Nothing syslogged besides gridftp connections.

* Are you using a good equipment (routers/switches) ?

 Yea. All my condor gear is directly connected into a Cisco 6509
core switch.

Cluster nodes are all on cisco 4948 leaf switches with 10 Gbitlinks

 back to said core switch.

* What is the topology of your network ?

 see above

I suspect the problem is either with OS or network, anyway, not
condor related.

 This schedd has been humming along busily for weeks, right up until
it got to
 about 3000 jobs queued up.

 The problem goes away when I hold half or so of the jobs in this
schedd.
 Now, with a large chunk of the queue held, condor's negotiated and
started
 hundreds of jobs like it should. I've got the queue drained by now,
though, just by
 holding a big chunk, and periodically releasing 6-700 jobs..

 So while I never really solved the problem, I've worked around it.

-Preston

--
Preston Smith  <psmith@xxxxxxxxxx>
Systems Research Engineer
Rosen Center for Advanced Computing, Purdue University



_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users


--
Preston Smith  <psmith@xxxxxxxxxx>
Systems Research Engineer
Rosen Center for Advanced Computing, Purdue University



_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

Follow-Ups:
- Re: [Condor-users] timeout reading buffer
  - From: Preston Smith

References:
- Re: [Condor-users] timeout reading buffer
  - From: Jaime Frey
- Re: [Condor-users] timeout reading buffer
  - From: Preston Smith
- Re: [Condor-users] timeout reading buffer
  - From: Maxim Kovgan
- Re: [Condor-users] timeout reading buffer
  - From: Preston Smith
- Re: [Condor-users] timeout reading buffer
  - From: Dan Bradley
- Re: [Condor-users] timeout reading buffer
  - From: Preston Smith

Prev by Date: Re: [Condor-users] Help required regarding ClassAds
Next by Date: Re: [Condor-users] intel amd binaries : how to submit ?
Previous by thread: Re: [Condor-users] timeout reading buffer
Next by thread: Re: [Condor-users] timeout reading buffer
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] timeout reading buffer