[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor_status and condor_q disagree about state of vm's



Hi:

I've spent the last couple of days looking for an answer to this issue and searched the archives, but came up empty handed. If this has been addressed before please excuse the rehash.

I've got a small pool of two SMP machines, both with dual dual-core Opteron processors. In the default configuration that's 8 vm's. I would expect that this would mean that I should never be able to have more than 8 jobs running in this pool at any given time, but I have been able to do just that.

For (as of yet) undetermined reasons, the schedd will not recognize that a startd is running for on some vms. See below the (trimmed) results of a condor_status:

Name          OpSys       Arch   State      Activity

vm1@server-1  LINUX       X86_64 Unclaimed  Idle
vm2@server-1  LINUX       X86_64 Unclaimed  Idle
vm3@server-1  LINUX       X86_64 Claimed    Busy
vm4@server-1  LINUX       X86_64 Unclaimed  Idle
vm1@server-2  LINUX       X86_64 Unclaimed  Idle
vm2@server-2  LINUX       X86_64 Unclaimed  Idle
vm3@server-2  LINUX       X86_64 Claimed    Busy
vm4@server-2  LINUX       X86_64 Claimed    Busy

Now look at the (trimmed) results of a condor_q -running:

ID      HOST(S)
68.0   vm4@server-1
69.0   vm4@server-2
70.0   vm3@server-1
71.0   vm3@server-2

notice that vm4 on server-1 is running a job, but shows up as Unclaimed/Idle. Does anyone have an explanation of why this might happen or what I can do to further debug the issue?

Some other information that might be relevant:

* server-1 is the central manager for this pool and runs a schedd
* jobs are remotely submitted from other hosts to the schedd on server-1
* server-2 does not seem to have the same issue (i.e. condor_status always reports the correct results). * if other jobs are submitted to run on server-1 the vm's that will report Claimed/Busy will change (i.e. vm3 will be Idle, vm4 will be Busy).

Thanks in advance to any assistance anyone can offer.

Regards,
Bob

--
Earl (Bob) Kinney
UNIX Systems Administrator
Harvard-MIT Data Center