[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Multithreading and Condor



Hello,

I'm attempting to run a multithreaded registration.  I am using a computing cluster with 13 nodes managed by Condor.  Each node us a dual quad-core with 32GB RAM, running the 64-bit version of RHEL.  I've implemented the registration in ITK, but running it I found no speed up when using additional threads.  Now I'm trying to figure out if the additional threads were used, and if not, why they were blocked.

When I look at the job details it looks like only one CPU was requested.  Below are the results from "condor_history -long pid" for the three jobs with 1, 4, and 8 threads:

2076.0   1 Thread    Run Time: 0+05:28:28
LocalUserCpu 0
LocalSysCpu 0
RemoteUserCpu 23858
RemoteSysCpu 10
RequestCpus 1

2075.0   4 threads    Run Time: 0+05:27:41
LocalUserCpu  0
LocalSysCpu 0
RemoteUserCpu 23680
RemoteSysCpu 9
RequestCpus 1

2077.0   8 Threads   Run Time: 0+05:30:14
LocalUserCpu 0
LocalSysCpu 0
RemoteUserCpu 24025
RemoteSysCpu 10
RequestCpus 1

I also reran the job requesting 8 threads so I could look at the allocation.  The job was allocated one slot on one node. I then Iogged onto that node and used the "top" command and the results are below.   Here it looks to me like multiple CPUs are being used.  I'd appreciate any thoughts on interpreting these results. 

top - 09:53:43 up 14 days, 20:45,  1 user,  load average: 2.31, 0.84, 0.31
Tasks:  233 total,   3 running, 230 sleeping,   0 stopped,   0 zombie
Cpu0  :   0.0%us,  0.0%sy, 40.2%ni, 59.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :   0.0%us,  0.0%sy, 85.7%ni, 14.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :   0.0%us,  0.3%sy, 49.2%ni, 50.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :   0.0%us,  0.0%sy, 84.9%ni, 15.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :   0.0%us,  0.3%sy, 45.0%ni, 54.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :   0.0%us,  0.0%sy, 84.4%ni, 15.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :   0.0%us,  0.3%sy, 39.9%ni, 59.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :   0.0%us,  0.3%sy, 84.4%ni, 15.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu8  :   0.0%us,  0.3%sy, 95.7%ni,  4.0%id,   0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu9  :   0.0%us,  0.0%sy, 93.0%ni,  7.0%id,   0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu10 :  0.0%us,  0.0%sy, 85.7%ni, 14.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu11 :  0.0%us,  0.3%sy, 95.0%ni,  4.7%id,   0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu12 :  0.0%us,  0.0%sy, 94.7%ni,  5.3%id,   0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu13 :  0.0%us,  0.0%sy, 99.3%ni,  0.7%id,   0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu14 :  0.0%us,  0.0%sy, 92.0%ni,  8.0%id,   0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu15 :  0.0%us,  0.0%sy, 96.7%ni,  3.3%id,   0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32948672k total,  3668612k used, 29280060k free,   359324k buffers
Swap: 31262480k total,        0k used, 31262480k free,  1014788k cached

PID  USER      PR  NI  VIRT   RES   SHR  %CPU %MEM    TIME+   COMMAND
3333  smrolfe    35  10   3004m  2.0g  3700  1267.0      6.4   5:40.47  condor_exec.exe
3406  smrolfe    15    0    12880  1216   832    R    0.3      0.0   0:00.22  top
1 root                 15   0    10368   632   536  S    0.0      0.0    0:01.79  init
2 root                 RT -5     0     0     S    0.0      0.0    0:00.04  migration/0
3 root               34  19    0     0     S    0.0      0.0        0:00.00  ksoftirqd/0
4 root                RT  -5    0     0     S    0.0      0.0        0:00.00  watchdog/0
5 root                RT  -5    0     0     S    0.0      0.0        0:00.03  migration/1