[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Problem using condor_restart on machines without FQDN


Our cluster is Linux based. In order to run nodes on Windows machines,
we start a virtual (Linux) machine on them, which in turn runs condor.

The problem is, this VM machines do not have a fully qualified domain
name, only a selfmade hostname. The FQDN of their Windows host PCs is
not much use, as we use a VPN network for communication with the nodes.

condor_status works so far, just displaying
Name               OpSys      Arch   State     Activity LoadAv Mem

slot1@VME1         LINUX      X86_64 Unclaimed Idle     0.000   995  0

However, as the hostname is not a FQDN a command like
condor_status VME1
I have discovered a workaround using
condor_status -constraint 'UtsnameNodename == "VME1"'
but this works only for condor_status

However, when I try to restart condor on this node I run into problems.
condor_restart VME1
Gives:"Can't find address for master 
Perhaps you need to query another pool."

The workaround
condor_restart -constraint 'UtsnameNodename == "VME1"'
Does not work either, giving:"Found no ClassAds when querying pool
Can't find addresses for master's for constraint 'UtsnameNodename ==
Perhaps you need to query another pool."

Is there a way to efficiently manage such nodes? 
IP Addresses will change, so putting all hostnames and their IP
addresses in /etc/hostname of the central server is not an option.
Interestingly, job matching and calculation does work...

What would you suggest?

Best regards from Vienna,
DI Hermann Fuchs
Christian Doppler Laboratory for Medical Radiation Research for Radiation Oncology
Department of Radiation Oncology
Medical University Vienna
Währinger Gürtel 18-20
A-1090 Wien

Tel.  + 43 / 1 / 40 400 7271
Mail. hermann.fuchs@xxxxxxxxxxxxxxxx