[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Problem using condor_restart on machines without FQDN



On Feb 21, 2013, at 2:26 AM, Hermann Fuchs <hermann.fuchs@xxxxxxxxxxxxxxxx> wrote:

> Our cluster is Linux based. In order to run nodes on Windows machines,
> we start a virtual (Linux) machine on them, which in turn runs condor.
> 
> The problem is, this VM machines do not have a fully qualified domain
> name, only a selfmade hostname. The FQDN of their Windows host PCs is
> not much use, as we use a VPN network for communication with the nodes.
> 
> condor_status works so far, just displaying
> Name               OpSys      Arch   State     Activity LoadAv Mem
> ActvtyTime
> 
> slot1@VME1         LINUX      X86_64 Unclaimed Idle     0.000   995  0
> +00:00:04
> 
> However, as the hostname is not a FQDN a command like
> condor_status VME1
> fails.
> I have discovered a workaround using
> condor_status -constraint 'UtsnameNodename == "VME1"'
> but this works only for condor_status
> 
> However, when I try to restart condor on this node I run into problems.
> condor_restart VME1
> Gives:"Can't find address for master 
> Perhaps you need to query another pool."
> 
> The workaround
> condor_restart -constraint 'UtsnameNodename == "VME1"'
> Does not work either, giving:"Found no ClassAds when querying pool
> (local)
> Can't find addresses for master's for constraint 'UtsnameNodename ==
> "VME1"'
> Perhaps you need to query another pool."
> 
> Is there a way to efficiently manage such nodes? 
> IP Addresses will change, so putting all hostnames and their IP
> addresses in /etc/hostname of the central server is not an option.
> Interestingly, job matching and calculation does work...
> 
> What would you suggest?


I believe part of your problem is that UtsnameNodename is present in your startd ads but not your master daemon ads, and the latter are used for commands like condor_restart.

Try using the Machine attribute in your constraint expressions. It's value is the machine's notion of its hostname and should have the same value in both startd and master ads.

Thanks and regards,
Jaime Frey
UW-Madison HTCondor Project