[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Jobs blocked as Idle in Multi-CPU machine



Hi, All:

I have two machines, nodeA contains 2 CPU, nodeB contains 1 CPU, here is the cpu information:
_______________________

ye@nodea:~$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Core(TM)2 CPU          6400  @ 2.13GHz
stepping        : 2
cpu MHz         : 1596.000
cache size      : 2048 KB
... ...
bogomips        : 4265.69
clflush size    : 64

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Core(TM)2 CPU          6400  @ 2.13GHz
stepping        : 2
cpu MHz         : 1596.000
cache size      : 2048 KB
... ...
bogomips        : 4262.73
clflush size    : 64

_______________________

ye@nodeb:~$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 1
model name      : Intel(R) Pentium(R) 4 CPU 1.70GHz
... ...
bogomips        : 3393.42
clflush size    : 64
_______________________


I fellow Condor's( 6.8.4) tutorial(http://www.cs.wisc.edu/condor/tutorials/intl-grid-school-3/) as my beginning, for the step of "Submitting your first Condor job", I find all the job submitted in nodeA are blocked as idle:
_______________________
ye@nodea:~$ condor_q

-- Submitter: nodea.gridgroup.eif.ch : <160.98.20.75:40855> : nodea.gridgroup.eif.ch
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
   2.0   globus          8/22 18:45   0+00:00:00 I  0   9.8  simple 4 10

1 jobs; 1 idle, 0 running, 0 held
_______________________

But when I submit the same job in nodeB, it works perfectly.
In this case, I checked the condor status, the following is the feedback:
_______________________

ye@nodea:~$ condor_status

Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime

vm1@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000  1000  0+03:45:04
vm2@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000  1000  0+03:45:05

                     Total Owner Claimed Unclaimed Matched Preempting Backfill

         INTEL/LINUX     2     0       0         2       0          0        0

               Total     2     0       0         2       0          0        0
_______________________

ye@nodeb:~$ condor_status

Name          OpSys       Arch   State      Activity   LoadAv Mem   ActvtyTime

nodeb.gridgro LINUX       INTEL  Unclaimed  Idle       0.000  1011  0+03:09:53

                     Total Owner Claimed Unclaimed Matched Preempting Backfill

         INTEL/LINUX     1     0       0         1       0          0        0

               Total     1     0       0         1       0          0        0
_______________________

I don't know whether it's caused by nodeA contains 2 CPU, so the jobs in nodeA is blocked because they don't know where to execute?
And how could I fix this problem upon nodeA(multi-processes)?

Thanks a lot!

Best regards
ye