Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] lost updates / network issues?

Date: Sun, 07 Oct 2007 20:18:55 -0500 (CDT)
From: Steven Timm <timm@xxxxxxxx>
Subject: Re: [Condor-users] lost updates / network issues?

The two problems are related--if you miss that many updates
then the collector will give up on a resource from time to time
and time out the classad.  This may be a time to turn on
UPDATE_COLLECTOR_WITH_TCP--that will make the updates much
more reliable.  At one point I had this level of updateslost
and changed to tcp and it solved the problem altogether.

Steve


------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.

On Mon, 8 Oct 2007, Wojtek Goscinski wrote:

Howdy All,

I'm hoping maybe someone can give me some advice about how to diagnose a
problem with our pool.

We're running a test pool with a handful of resources. Condorview is showing
that resources are sometimes appearing and disappearing (see attached
screenshot) - though I've only noticed this rarely with condor_status. There
is no specific reason for resources to join and leave - apart form network
issues perhaps...

In addition, condor_status shows me that a lot of updates are being lost -
sometimes around 1/4 (see below).

Hence, i've got 2 questions:

- is this amount of updates lost cause for concern? Machines are on a busy
student network. Should I be upping the rate at which updates occur?
- why might condorview be showing me that resources are entering and leaving
the pool? is this cause for concern?

Regards,

James

UpdatesTotal = 4725
UpdatesSequenced = 4793
UpdatesLost = 1028

UpdatesTotal = 5151
UpdatesSequenced = 5148
UpdatesLost = 366

UpdatesTotal = 4636
UpdatesSequenced = 4612
UpdatesLost = 916

UpdatesTotal = 3688
UpdatesSequenced = 3630
UpdatesLost = 1175

UpdatesTotal = 5214
UpdatesSequenced = 5213
UpdatesLost = 361

UpdatesTotal = 5202
UpdatesSequenced = 5201
UpdatesLost = 1471

UpdatesTotal = 5221
UpdatesSequenced = 5220
UpdatesLost = 284

References:
- [Condor-users] lost updates / network issues?
  - From: Wojtek Goscinski

Prev by Date: [Condor-users] lost updates / network issues?
Next by Date: [Condor-users] Windows power-save
Previous by thread: [Condor-users] lost updates / network issues?
Next by thread: [Condor-users] Windows power-save
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] lost updates / network issues?