[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Unsubscribe





On Jul 16, 2020, at 1:39 PM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:

Hi Kruno,

A couple quick thoughts on this:

1.  The collector is essentially a hashtable of ClassAds, and the key is a tuple of the classad's Name attribute and the MyAddress (ip address) attribute.  In the logs below, it appears that the first time batch1066.desy was marked as absent, it had an IP address of 127.0.0.1, as shown from this log entry:
07/13/20 15:12:44 Added ad to persistent store key=<slot2_3@xxxxxxxxxxxxxxxxx,127.0.0.1>
But then when this node crashed a second time, the Collector appears to have made a second absent entry because the IP address was different as shown here:
07/14/20 23:42:44 Added ad to persistent store key=<slot2_22@xxxxxxxxxxxxxxxxx,131.169.160.166>

So from the Collector's perspective, these are two different server instances.  Thus the original absent entry, with IP address 127.0.0.1, did not get replaced when the node crashed again.

2.  It seems bizarre to have absent ads for each slot, esp when your startd is configured for partitionable slots.  Perhaps instead of configuring

  ABSENT_REQUIREMENTS = True

you may prefer to say something like

  ABSENT_REQUIREMENTS = SlotID  == 1

Hope the above helps,
Todd
-- 
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685 
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/