[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_status hostname : why can't I use IP number instead of hostname?



Rob wrote:
> 
> 
> Ian,
> 
> Thank you for your explanation.
> 
> Apparently your pool PCs have proper full hostnames within your domain,
> so you don't run into this problem.
> 
> Now in my case all public library PCs typically have this type of configuration
> (from the 'ipconfig /all' output):
>   Host Name.... : pm37
>   Primary Dns Suffix...:
>   Node Type...: Unknown
>   IP Routing Enabled...: No
>   WINS Proxy Enabled....: No
> 
> 
> I'm not in a position to demand from the library to modify over 500 public PCs
> hostname configurations. They will rather tell me to forget about my Condor
> project plans in the library......I'm in a vulnerable position here!
> (Apart from my condor_status problem, the rest of the condor setup seems to
> work fine: I can submit jobs, query the queue etc. from the Linux condor master)
> 
> ------------------------
> 
> I really think I stepped on a buggy behaviour of the way condor_status works
> on the master, namely how it resolves hostnames of the pool PCs:
> 
> Given two Windows pool PCs with similar network configurations like above;
> one is multi core (hostname: pm10), the other single core (hostname: pm37).
> 
> Then condor_status can easily resolve the dual core PC when I add the
> "slot<n>@" to the hostname. Condor_status then miraculously finds the PC
> without a glitch:  "condor_status slot1@pm10" works just great!
> 
> The single core PC does not have a "slot<n>@" in its name; then condor_status
> cannot find this PC: "condor_status pm37" fails with "unknown host pm37"
> 
> 
> I found a workaround:
> add an entry   "123.45.67.89  pm37" into the /etc/hosts of my Linux master PC.
> But I consider this a bad solution.
> 
> 
> I think condor_status should behave the same for multi and single core PCs.
> It's very annoying that it only works for multi core PCs, because they accidentally
> have the "slot<n>@" in their names....
> 
> May I qualify this as buggy behaviour of condor_status?
> 
> Thanks,
> Rob.

A quick comment - Condor tries to be extra helpful in an environment
where DNS is setup "correctly." So when you're running condor_status
pm37, Condor is trying to figure out what Name to lookup in the
Collector. If pm37 has an @ then Condor knows its a name already,
otherwise Condor tries to do DNS resolution to get a FQDN - because .
The resolution is helpful so you don't have to type a FQDN all the time,
e.g.

$ condor_status -l slot1@node35 | grep Name
$ condor_status -l slot1@xxxxxxxxxxxxxxxxxx | grep Name
Name = "slot1@xxxxxxxxxxxxxxxxxx"
$ condor_status -l node35 | grep Name
Name = "slot1@xxxxxxxxxxxxxxxxxx"
Name = "slot2@xxxxxxxxxxxxxxxxxx"
Name = "slot3@xxxxxxxxxxxxxxxxxx"
Name = "slot4@xxxxxxxxxxxxxxxxxx"
Name = "slot5@xxxxxxxxxxxxxxxxxx"
Name = "slot6@xxxxxxxxxxxxxxxxxx"
Name = "slot7@xxxxxxxxxxxxxxxxxx"
Name = "slot8@xxxxxxxxxxxxxxxxxx"

You may also be able to get around the DNS setup issue by giving the
daemons a fake name, using DAEMON_NAME.

You can get a better idea of what Condor is doing by running:

env _CONDOR_TOOL_DEBUG=D_ALL condor_status -debug node35

Try grep'ing for Requirements, e.g.

$ env _CONDOR_TOOL_DEBUG=D_ALL condor_status -debug node35 2>&1 | grep
Requirements
Requirements = (((TARGET.Name == "node35.example.com") ||
(TARGET.Machine == "node35.example.com")))

So much for a quick comment, and I'd add that it seems reasonable that
the query Condor constructs could include an un-expanded node35.

Until that happens, another workaround is to query directly, e.g.

condor_status -constraint 'Name == "pm37"'

You are welcome to take a shot at how the query is constructed in the code.

Best,


matt