[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] One node doesn't execute jobs



Herb,

I just want to correct my previous statemen. If in your condor_config
file you have the startd as part of your daemon_list and you do not
specify this machine is willing to start jobs (start = true), the node
state would show up as Owner. I get always confused about this. 

Alex 

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Alas, Alex [FEDI]
Sent: Friday, May 11, 2012 3:18 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] One node doesn't execute jobs

Try condor_q -analize or condor_q -long. If that is not enough, go to
c:\condor\log, select\open the masterlog. Search for errors. I had seen
that error message when I had the START = True but not listed the STARTD
within the daemon_list or viceversa. 


Alex 

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Smith, Herb
Sent: Friday, May 11, 2012 3:05 PM
To: condor-users@xxxxxxxxxxx
Subject: [Condor-users] One node doesn't execute jobs

All,

I have a very simple pool consisting of 3 nodes.  The master node is a
Windows7 machine and the other two are Windows XP.  Each machine has a
dual core processor and Condor correctly sees all 6 processors
available.  I included the logic to ensure that both types of operating
systems would be used:
Requirements   = (OpSys == "WINNT51" && Arch == "INTEL") || (OpSys ==
"WINDOWS" && Arch == "X86_64")

Which seems to work fine, with one exception.  The master node and one
of the two pool members accept jobs, the remaining pool member shows a
status of "Matched" but it never shows that it was "Claimed" nor does it
run any jobs.

Here is a typical inquiry:
condor_status

Name               OpSys      Arch   State     Activity LoadAv Mem
ActvtyTime

slot1@xxxxxxxxxxxx WINDOWS    X86_64 Claimed   Busy     1.000  2014
0+00:00:01
slot2@xxxxxxxxxxxx WINDOWS    X86_64 Claimed   Busy     1.010  2014
0+00:00:01
slot1@xxxxxxxxxxxx WINNT51    INTEL  Matched   Idle     0.000  1018
0+00:00:04
slot2@xxxxxxxxxxxx WINNT51    INTEL  Matched   Idle     0.020  1018
0+00:00:05
slot1@xxxxxxxxxxxx WINNT51    INTEL  Claimed   Busy     0.000  1002
0+00:00:01
slot2@xxxxxxxxxxxx WINNT51    INTEL  Claimed   Busy     0.000  1002
0+00:00:02
                     Total Owner Claimed Unclaimed Matched Preempting
Backfill

       INTEL/WINNT51     4     0       2         0       2          0
0
      X86_64/WINDOWS     2     0       2         0       0          0
0

               Total     6     0       4         0       2          0
0

Both of the pool machines have the same operating system setup as all
the machines in the company receive the same software load.  Is there
some way to determine why this machine is not picking up any of the work
load?

Thanks,

Herb Smith




_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/