Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Quill out of sync

Date: Wed, 22 Jul 2009 09:36:55 +0200
From: Carsten Aulbert <carsten.aulbert@xxxxxxxxxx>
Subject: [Condor-users] Quill out of sync

Hi all,

we might have a problem here caused by a networking issue yesterday when our 
mgmt. network was flooded with traffic.

We have four head nodes which share a negotiator in HA mode and at some point 
yesterday one node decided it would be the negotiator for a couple of minutes 
as it could not connect to any other head node. Now we have this weird 
situation that quill and the "direct" query are out of sync:

Querying against quill
atlas2# condor_q -g |grep running
2 jobs; 0 idle, 2 running, 0 held
9648 jobs; 3150 idle, 6498 running, 0 held

Direct query
atlas2# condor_q -g -direct schedd|grep running
21 jobs; 8 idle, 13 running, 0 held
2081 jobs; 1 idle, 2080 running, 0 held
1 jobs; 0 idle, 1 running, 0 held

condor_status believes this:
                     Total Owner Claimed Unclaimed Matched Preempting Backfill

        X86_64/LINUX  6602     0    2070        31       0          0     4501

               Total  6602     0    2070        31       0          0     4501

The negotiator agrees by telling me (for any user):
Got NO_MORE_JOBS;  done negotiating

How do we get quill and the daemons back to sync, it's been in this state now 
for more than 12 hours, thus I would assume it would have had a chance to 
replay the "forgotten" transactions, right?

Cheers

Carsten

Follow-Ups:
- Re: [Condor-users] Quill out of sync
  - From: Carsten Aulbert

Prev by Date: Re: [Condor-users] Central Manager performance during matchmaking
Next by Date: [Condor-users] Condor 7.2.*, VC90 manifest, and .NET 3
Previous by thread: Re: [Condor-users] condor_status hostname : why can't I use IP number instead of hostname?
Next by thread: Re: [Condor-users] Quill out of sync
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[Condor-users] Quill out of sync