[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Collector house cleaning activities?


I have a pool of 18 dedicated 4-way execute nodes.  My problem is that
condor_status sporadically reports less than the expected 72 nodes.  If I
use condor_status  -direct NODENAME everything looks fine.  I've turned on
D_FULLDEBUG on the collector and have not seen anything unexpected except
for "Removing stale ads for vm?@NODENAME".  I've turned on D_FULLDEBUG for
the startd on the affected nodes and, again, nothing seems to be in error.
Another wrinkle to this problem is that it seems to be only affecting the
execute nodes on the same switch as the central manager.

For example, a condor_status would report that I have vm2, vm3, vm4 but no
vm1.  If I start a run with condor_status reporting 71 nodes, that's all
I'll get.  If I loop through all nodes with a condor_restart NODENAME
-startd before the classad lifetime expires I'll be OK.   I have submitted
this before but without this level of detail.  I know that this might be a
problem on my end but I'm at my wits end.

Any help would be greatly appreciated.

Thank you,

Bob Nordlund

PRIVILEGED AND CONFIDENTIAL: This communication, including attachments, is
for the exclusive use of addressee and may contain proprietary,
confidential and/or privileged information.  If you are not the intended
recipient, any use, copying, disclosure, dissemination or distribution is
strictly prohibited.  If you are not the intended recipient, please notify
the sender immediately by return e-mail, delete this communication and
destroy all copies.