[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Problem using condor_restart on machines without FQDN



Hello

Our cluster is Linux based. In order to run nodes on Windows machines,
we start a virtual (Linux) machine on them, which in turn runs condor.

The problem is, this VM machines do not have a fully qualified domain
name, only a selfmade hostname. The FQDN of their Windows host PCs is
not much use, as we use a VPN network for communication with the nodes.

condor_status works so far, just displaying
Name               OpSys      Arch   State     Activity LoadAv Mem
ActvtyTime

slot1@VME1         LINUX      X86_64 Unclaimed Idle     0.000   995  0
+00:00:04

However, as the hostname is not a FQDN a command like
condor_status VME1
fails.
I have discovered a workaround using
condor_status -constraint 'UtsnameNodename == "VME1"'
but this works only for condor_status

However, when I try to restart condor on this node I run into problems.
condor_restart VME1
Gives:"Can't find address for master 
Perhaps you need to query another pool."

The workaround
condor_restart -constraint 'UtsnameNodename == "VME1"'
Does not work either, giving:"Found no ClassAds when querying pool
(local)
Can't find addresses for master's for constraint 'UtsnameNodename ==
"VME1"'
Perhaps you need to query another pool."

Is there a way to efficiently manage such nodes? 
IP Addresses will change, so putting all hostnames and their IP
addresses in /etc/hostname of the central server is not an option.
Interestingly, job matching and calculation does work...

What would you suggest?

Best regards from Vienna,
Hermann
-- 
-------------
DI Hermann Fuchs
Christian Doppler Laboratory for Medical Radiation Research for Radiation Oncology
Department of Radiation Oncology
Medical University Vienna
Währinger Gürtel 18-20
A-1090 Wien

Tel.  + 43 / 1 / 40 400 7271
Mail. hermann.fuchs@xxxxxxxxxxxxxxxx