[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_status hostname : why can't I use IP number instead of hostname?





Ian,

Thank you for your explanation.

Apparently your pool PCs have proper full hostnames within your domain,
so you don't run into this problem.

Now in my case all public library PCs typically have this type of configuration
(from the 'ipconfig /all' output):
  Host Name.... : pm37
  Primary Dns Suffix...:
  Node Type...: Unknown
  IP Routing Enabled...: No
  WINS Proxy Enabled....: No


I'm not in a position to demand from the library to modify over 500 public PCs
hostname configurations. They will rather tell me to forget about my Condor
project plans in the library......I'm in a vulnerable position here!
(Apart from my condor_status problem, the rest of the condor setup seems to
work fine: I can submit jobs, query the queue etc. from the Linux condor master)

------------------------

I really think I stepped on a buggy behaviour of the way condor_status works
on the master, namely how it resolves hostnames of the pool PCs:

Given two Windows pool PCs with similar network configurations like above;
one is multi core (hostname: pm10), the other single core (hostname: pm37).

Then condor_status can easily resolve the dual core PC when I add the
"slot<n>@" to the hostname. Condor_status then miraculously finds the PC
without a glitch:  "condor_status slot1@pm10" works just great!

The single core PC does not have a "slot<n>@" in its name; then condor_status
cannot find this PC: "condor_status pm37" fails with "unknown host pm37"


I found a workaround:
add an entry   "123.45.67.89  pm37" into the /etc/hosts of my Linux master PC.
But I consider this a bad solution.


I think condor_status should behave the same for multi and single core PCs.
It's very annoying that it only works for multi core PCs, because they accidentally
have the "slot<n>@" in their names....

May I qualify this as buggy behaviour of condor_status?

Thanks,
Rob.




----- Original Message ----
From: Ian Chesal

> I just don't understand why "condor_status slot1@pm37" would just work
> (if pm37 had been a dual core PC).
>
> Any ideas?

Yes, with machines advertising only one slot the slot portion of the
address is dropped by Condor. It's an inconsistency in how Condor names
and manages slots in a pool. It's been there for as long as I can
remember now. You can't ask a 1-slot machine for information using the
<slot>@ notation. You have to drop the slot portion of the request.
Condor doesn't have an instance of "slot1@pm37" in its collector DB, it
only has "pm37" -- so asking for "slot1@pm37" doesn't resolve to a
machine Condor knows how to contact.

>From my own dev pool:
> condor_status

Name               OpSys      Arch   State     Activity LoadAv Mem
ActvtyTime

slot1@xxxxxxxxxxxx LINUX      INTEL  Unclaimed Idle     0.160  1256
0+02:33:05
slot2@xxxxxxxxxxxx LINUX      INTEL  Unclaimed Idle     0.000   769
12+14:46:42
sj-arcdev.altera.c LINUX      X86_64 Owner     Idle     0.140  8192
11+15:40:57
slot1@sj-bs3400-31 LINUX      X86_64 Unclaimed Idle     0.000  1224
13+15:21:27
slot2@sj-bs3400-31 LINUX      X86_64 Unclaimed Idle     0.000  1224
12+05:03:31
slot3@sj-bs3400-31 LINUX      X86_64 Unclaimed Idle     0.000   750
5+15:09:12
slot4@sj-bs3400-31 LINUX      X86_64 Unclaimed Idle     0.190   750
0+02:52:12
slot1@sj-bs3400-31 LINUX      X86_64 Unclaimed Idle     0.000  1224
0+17:27:58
slot2@sj-bs3400-31 LINUX      X86_64 Unclaimed Idle     0.000  1224
1+15:40:56
slot3@sj-bs3400-31 LINUX      X86_64 Unclaimed Idle     0.000   750
1+14:53:42