[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_status and condor_q disagree about state ofvm's

Have you tried using the -direct option to condor_status to get the
info from the node itself rather than from the central node?

BTW do you have a startd on your central node too? If so,
you should be careful, there may be security implications of that.



> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx]On Behalf Of Bob Kinney
> Sent: Friday, April 20, 2007 6:50 PM
> To: condor-users@xxxxxxxxxxx
> Subject: [Condor-users] condor_status and condor_q disagree 
> about state
> ofvm's
> Hi:
> I've spent the last couple of days looking for an answer to 
> this issue 
> and searched the archives, but came up empty handed.  If this 
> has been 
> addressed before please excuse the rehash.
> I've got a small pool of two SMP machines, both with dual dual-core 
> Opteron processors.  In the default configuration that's 8 vm's.  I 
> would expect that this would mean that I should never be able to have 
> more than 8 jobs running in this pool at any given time, but 
> I have been 
> able to do just that.
> For (as of yet) undetermined reasons, the schedd will not 
> recognize that 
> a startd is running for on some vms.  See below the (trimmed) 
> results of 
> a condor_status:
> Name          OpSys       Arch   State      Activity
> vm1@server-1  LINUX       X86_64 Unclaimed  Idle
> vm2@server-1  LINUX       X86_64 Unclaimed  Idle
> vm3@server-1  LINUX       X86_64 Claimed    Busy
> vm4@server-1  LINUX       X86_64 Unclaimed  Idle
> vm1@server-2  LINUX       X86_64 Unclaimed  Idle
> vm2@server-2  LINUX       X86_64 Unclaimed  Idle
> vm3@server-2  LINUX       X86_64 Claimed    Busy
> vm4@server-2  LINUX       X86_64 Claimed    Busy
> Now look at the (trimmed) results of a condor_q -running:
> ID      HOST(S)
> 68.0   vm4@server-1
> 69.0   vm4@server-2
> 70.0   vm3@server-1
> 71.0   vm3@server-2
> notice that vm4 on server-1 is running a job, but shows up as 
> Unclaimed/Idle.  Does anyone have an explanation of why this might 
> happen or what I can do to further debug the issue?
> Some other information that might be relevant:
> * server-1 is the central manager for this pool and runs a schedd
> * jobs are remotely submitted from other hosts to the schedd 
> on server-1
> * server-2 does not seem to have the same issue (i.e. condor_status 
> always reports the correct results).
> * if other jobs are submitted to run on server-1 the vm's that will 
> report Claimed/Busy will change (i.e. vm3 will be Idle, vm4 
> will be Busy).
> Thanks in advance to any assistance anyone can offer.
> Regards,
> Bob
> -- 
> Earl (Bob) Kinney
> UNIX Systems Administrator
> Harvard-MIT Data Center
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to 
> condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> The archives can be found at either
> https://lists.cs.wisc.edu/archive/condor-users/
> http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR