[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Ask a question about condor setting



Dear all,
  I'd like to submit 2 condor jobs to my testing cluster. Those 2 jobs use 
the same script which counts the square root of a number. There are 2 
machines in my cluster. Also, I installed a P4 CPU with Hyper-Threading in 
each machine. So, we can use the condor_status to get the status result.

[root@tb032 log]# condor_status

Name          OpSys       Arch   State      Activity   LoadAv Mem   
ActvtyTime

vm1@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   496  
0+00:34:49
vm2@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   496  
0+00:00:06
vm1@xxxxxxxxx LINUX       INTEL  Owner      Idle       0.000  1009  
0+00:15:09
vm2@xxxxxxxxx LINUX       INTEL  Owner      Idle       0.000  1009  
0+00:15:10

                     Machines Owner Claimed Unclaimed Matched Preempting

         INTEL/LINUX        4     2       0         2       0          0

               Total        4     2       0         2       0          0


However, when i submit 2 jobs like the following
[sary357@tb032 job]$ condor_submit job1.jdl
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 71.
[sary357@tb032 job]$ condor_submit job2.jdl
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 72.



I've got kind of strange status like the following:
[sary357@tb032 job]$ condor_q


-- Submitter: tb032.grid.sinica.edu.tw : <140.109.98.82:33472> : 
tb032.grid.sinica.edu.tw
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
  71.0   sary357         4/9  18:03   0+00:08:34 R  0   0.0  job1.sh
  72.0   sary357         4/9  18:03   0+00:00:00 I  0   0.0  job1.sh

2 jobs; 1 idle, 1 running, 0 held


[root@tb032 log]# condor_status

Name          OpSys       Arch   State      Activity   LoadAv Mem   
ActvtyTime

vm1@xxxxxxxxx LINUX       INTEL  Claimed    Busy       0.000   496  
0+00:00:03
vm2@xxxxxxxxx LINUX       INTEL  Owner      Idle       0.070   496  
0+00:00:06
vm1@xxxxxxxxx LINUX       INTEL  Owner      Idle       0.020  1009  
0+00:20:09
vm2@xxxxxxxxx LINUX       INTEL  Owner      Idle       0.000  1009  
0+00:20:10

                     Machines Owner Claimed Unclaimed Matched Preempting

         INTEL/LINUX        4     3       1         0       0          0

               Total        4     3       1         0       0          0

I've trid to display all information of osgs01:
Start = ((KeyboardIdle > 15 * 60) && (((LoadAvg - CondorLoadAvg) <= 
0.000000) || (State != "Unclaimed" && State != "Owner")))
I found out LoadAvg is 0.0 and CondorLoadAvg is 0.0, too. KeyboardIdle is 
5707.


It's kind of strange why only vm1@xxxxxxxxx run a job, but the state of 
vm2@xxxxxxxxx change from unclaimed to claimed without executing any jobs? 
Of course, the loading of the host osgs01 is very high when running single 
job like the following.
[root@osgs01 log]# top
 18:06:36  up 14 days,  3:27,  1 user,  load average: 0.94, 0.44, 0.16
61 processes: 59 sleeping, 2 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total    0.0%   48.1%    1.9%   0.0%     0.0%    0.0%   49.9%
           cpu00    0.0%   45.2%    1.8%   0.0%     0.0%    0.0%   52.9%
           cpu01    0.0%   51.0%    2.0%   0.0%     0.0%    0.0%   47.0%
Mem:  1017308k av,  997708k used,   19600k free,       0k shrd,  188832k buff
                    365088k actv,  254800k in_d,   20000k in_c
Swap: 2096472k av,       0k used, 2096472k free                  491280k 
cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
32467 sary357   35  10   984  984   872 R N  50.1  0.0   2:56   0 
condor_exec.e
...

But I do not understand why the state of vm2@osgs01 changed automatically. 
It's because of loading of osgs01.gr? or? What setting can I modify to 
utilize the whole computing power? Could anyone know?




Best regards,
Fu-Ming


----------------------------------------------------------------------
"Gravitation is not responsible for people falling in love." 

Fu-Ming Tsai
Academia Sinica Grid Computing Centre
sary357@xxxxxxxxxxxxxxxxxx
------------------------------------------------------------------------