[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] condor_status hostname : why can't I use IP number instead of hostname?
- Date: Thu, 23 Jul 2009 10:27:07 -0500
- From: Matthew Farrellee <matt@xxxxxxxxxx>
- Subject: Re: [Condor-users] condor_status hostname : why can't I use IP number instead of hostname?
> Thank you for your explanation.
> Apparently your pool PCs have proper full hostnames within your domain,
> so you don't run into this problem.
> Now in my case all public library PCs typically have this type of configuration
> (from the 'ipconfig /all' output):
> Host Name.... : pm37
> Primary Dns Suffix...:
> Node Type...: Unknown
> IP Routing Enabled...: No
> WINS Proxy Enabled....: No
> I'm not in a position to demand from the library to modify over 500 public PCs
> hostname configurations. They will rather tell me to forget about my Condor
> project plans in the library......I'm in a vulnerable position here!
> (Apart from my condor_status problem, the rest of the condor setup seems to
> work fine: I can submit jobs, query the queue etc. from the Linux condor master)
> I really think I stepped on a buggy behaviour of the way condor_status works
> on the master, namely how it resolves hostnames of the pool PCs:
> Given two Windows pool PCs with similar network configurations like above;
> one is multi core (hostname: pm10), the other single core (hostname: pm37).
> Then condor_status can easily resolve the dual core PC when I add the
> "slot<n>@" to the hostname. Condor_status then miraculously finds the PC
> without a glitch: "condor_status slot1@pm10" works just great!
> The single core PC does not have a "slot<n>@" in its name; then condor_status
> cannot find this PC: "condor_status pm37" fails with "unknown host pm37"
> I found a workaround:
> add an entry "22.214.171.124 pm37" into the /etc/hosts of my Linux master PC.
> But I consider this a bad solution.
> I think condor_status should behave the same for multi and single core PCs.
> It's very annoying that it only works for multi core PCs, because they accidentally
> have the "slot<n>@" in their names....
> May I qualify this as buggy behaviour of condor_status?
A quick comment - Condor tries to be extra helpful in an environment
where DNS is setup "correctly." So when you're running condor_status
pm37, Condor is trying to figure out what Name to lookup in the
Collector. If pm37 has an @ then Condor knows its a name already,
otherwise Condor tries to do DNS resolution to get a FQDN - because .
The resolution is helpful so you don't have to type a FQDN all the time,
$ condor_status -l slot1@node35 | grep Name
$ condor_status -l slot1@xxxxxxxxxxxxxxxxxx | grep Name
Name = "slot1@xxxxxxxxxxxxxxxxxx"
$ condor_status -l node35 | grep Name
Name = "slot1@xxxxxxxxxxxxxxxxxx"
Name = "slot2@xxxxxxxxxxxxxxxxxx"
Name = "slot3@xxxxxxxxxxxxxxxxxx"
Name = "slot4@xxxxxxxxxxxxxxxxxx"
Name = "slot5@xxxxxxxxxxxxxxxxxx"
Name = "slot6@xxxxxxxxxxxxxxxxxx"
Name = "slot7@xxxxxxxxxxxxxxxxxx"
Name = "slot8@xxxxxxxxxxxxxxxxxx"
You may also be able to get around the DNS setup issue by giving the
daemons a fake name, using DAEMON_NAME.
You can get a better idea of what Condor is doing by running:
env _CONDOR_TOOL_DEBUG=D_ALL condor_status -debug node35
Try grep'ing for Requirements, e.g.
$ env _CONDOR_TOOL_DEBUG=D_ALL condor_status -debug node35 2>&1 | grep
Requirements = (((TARGET.Name == "node35.example.com") ||
(TARGET.Machine == "node35.example.com")))
So much for a quick comment, and I'd add that it seems reasonable that
the query Condor constructs could include an un-expanded node35.
Until that happens, another workaround is to query directly, e.g.
condor_status -constraint 'Name == "pm37"'
You are welcome to take a shot at how the query is constructed in the code.