[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] After negotiator problems and restart condor_userprio reports wrong values



On Fri, Aug 22, 2008 at 09:49:20AM +0200, Carsten Aulbert wrote:
Hello,
> 
> in a set-up with multiple condor negotiators we had a situation where
> the system had this problem:
> 
> condor_userprio: "Can't find address for negotiator"
> 
> We (carefully) restarted condor on that machine and everything looked
> fine at first glance except:
> 
> (1) the user prio factors were all reset to default values
> (2) we lost a lot of our "history" on that node:
> 
> Number of users: 17  1178    530122.60  4/06/2008 14:31        ???
> 
> on the other node:
> Number of users: 24  1179   4369145.74  5/27/2008 23:36        ???
> 
> Finally:
> 
> on the "deranged" node we have this line in userprio:
> 
> user@xxxxxxxxxxx  500.00     0.50      1000.00    0   -202356.46
> 6/26/2008 15:41  7/15/2008 15:43

The negotiator runs on another host (n2). There we get the right results for
condor_userprio.

A strace condor_userprio on this node (n1) reports this line:

connect(3, {sa_family=AF_INET, sin_port=htons(9618), sin_addr=inet_addr("IP_n2")}, 16) = -1 EINPROGRESS (Operation now in progress)

>From one moment to another condor_userprio stopped working. 
May it be the broken connection to n2? Unfortunately, we haven't done a strace while it was working.

Cheers,
Henning Fehrmann