[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] collector complaining about not receiving command requests from execution points



Hi Thomas,

A good starting place is to try and find out what command was being sent to decern what condor was trying to do when this failure occurred. To do this you can add D_COMMAND:1 to the collector debug.

-Cole Bollig

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Thomas Hartmann <thomas.hartmann@xxxxxxx>
Sent: Wednesday, July 26, 2023 6:32 AM
To: condor-users@xxxxxxxxxxx <condor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] collector complaining about not receiving command requests from execution points
 
Hi all,

I am wondering about my test cluster's central manager collector, that
is complaining about broken commands from execution points [1].
In principle, master and startd daemons are allowed to advertise on the
collector side with workers looking healthy and showing up in the slot list.
In principle, all looks good to me, so that I am not sure, what received
commands(?) are supposed to be broken or timeouting? (assuming that
there were no ads in the broken requests)

Cheers,
   Thomas


[1]
07/26/23 12:35:16 Got INVALIDATE_ADS_GENERIC
07/26/23 12:35:16 Walking tables to invalidate... O(n)
07/26/23 12:35:16 (Invalidated 0 ads)
07/26/23 12:35:16 DaemonCore: Can't receive command request from
131.169.223.162 (perhaps a timeout?)

07/26/23 12:35:43 Got INVALIDATE_ADS_GENERIC
07/26/23 12:35:43 Walking tables to invalidate... O(n)
07/26/23 12:35:43 (Invalidated 0 ads)