[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] execute hosts advertise loopback address



 

Hello All,


I have come across a very strange problem with our HTCondor pool whereby *some* execute hosts advertise

the loopback address as the address of the startd as evidenced by this from CollectorLog:


05/21/20 14:09:35 StartdAd     : Inserting ** "< slot1@xxxxxxxxxxxxxxxxxxxxxxxx , 127.0.0.1 >"
05/21/20 14:09:35 StartdPvtAd  : Inserting ** "< slot1@xxxxxxxxxxxxxxxxxxxxxxxx , 127.0.0.1 >"

Some execute hosts work fine and advertise their correct address whereas a substantial number advertise the
loopback and I believe there are even examples of both on the same subnet.  The execute hosts all run Windows 10
and HTCondor version 8.4.6 and employ power saving so that idle machines (viz no local user use or HTCondor
use) go into hibernation after approx 10 minutes.

A typical scenario is that I wake a machine to a run job, the machine advertises its loopback address to the
collector. The negotiator either finds a match or ignores the loopback - no quite sure which. but in any case the job never
starts on the execute host and so the host returns to hibernation.

I turned up this submission to htcondor-users in the archives but it seems pretty old (Windows XP) and doesn't seem to
come up with a satisfactory solution:


Any suggestions would be extremely useful as I'm totally baffled by this.

regards,

-ian.

Dr Ian C. Smith,
Condor Manager,
Advanced Research Computing,
University of Liverpool
UK.