[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] CPU Core detection?



On 08/01/2011 02:28 PM, Michael Di Domenico wrote:
How does condor determine how many cores are in a box?

I'm running into an issue on RHEL using Condor 7.4 and Magny-Cours 12-core cpus

You can see this kernel thread

http://groups.google.com/group/linux.kernel/browse_thread/thread/3f3904516374e62c?pli=1

I'm looking into the kernel issue presently, but i'm curious what work
in condor has been done around this?

I suspect since my 48-core boxes are showing up as 24-slot machines
with COUNT_HYPERTHREADED_CPUS=false, that condor is mis-reading the
/proc/cpuinfo file instead of looking in the /sys files, but i'd like
some confirmation

thanks

You could try reading this code (mind the #ifdefs 8o)...

http://condor-git.cs.wisc.edu/?p=condor.git;a=blob;f=src/condor_sysapi/ncpus.cpp;h=7d65306516f2cb7c9b50b4d9cd8825df60c39610;hb=master

Or watch it in action...

$ _CONDOR_TOOL_DEBUG=D_ALL condor_config_val -debug
08/01/11 14:37:49 (fd:2) (pid:17500) config: using subsystem 'TOOL', local ''
08/01/11 14:37:49 (fd:2) (pid:17500) Reading from /proc/cpuinfo
08/01/11 14:37:49 (fd:2) (pid:17500) Found: Physical-IDs:True; Core-IDs:True
08/01/11 14:37:49 (fd:2) (pid:17500) Analyzing 4 processors using IDs...
08/01/11 14:37:49 (fd:2) (pid:17500) Looking at processor #0 (PID:0, CID:0):
08/01/11 14:37:49 (fd:2) (pid:17500) Comparing P#0 and P#1 : pid:0==0 and cid:0==0 (match=2) 08/01/11 14:37:49 (fd:2) (pid:17500) Comparing P#0 and P#2 : pid:0!=0 or cid:0!=2 (match=No) 08/01/11 14:37:49 (fd:2) (pid:17500) Comparing P#0 and P#3 : pid:0!=0 or cid:0!=2 (match=No)
08/01/11 14:37:49 (fd:2) (pid:17500) ncpus = 1
08/01/11 14:37:49 (fd:2) (pid:17500) P0: match->2
08/01/11 14:37:49 (fd:2) (pid:17500) P1: match->2
08/01/11 14:37:49 (fd:2) (pid:17500) Looking at processor #1 (PID:0, CID:0):
08/01/11 14:37:49 (fd:2) (pid:17500) Looking at processor #2 (PID:0, CID:2):
08/01/11 14:37:49 (fd:2) (pid:17500) Comparing P#2 and P#3 : pid:0==0 and cid:2==2 (match=2)
08/01/11 14:37:49 (fd:2) (pid:17500) ncpus = 2
08/01/11 14:37:49 (fd:2) (pid:17500) P2: match->2
08/01/11 14:37:49 (fd:2) (pid:17500) P3: match->2
08/01/11 14:37:49 (fd:2) (pid:17500) Looking at processor #3 (PID:0, CID:2):
08/01/11 14:37:49 (fd:2) (pid:17500) Using IDs: 4 processors, 2 CPUs, 2 HTs
08/01/11 14:37:49 (fd:2) (pid:17500) Reading condor configuration from '/etc/condor/condor_config' 08/01/11 14:37:49 (fd:2) (pid:17500) Finding local host information, calling gethostname() 08/01/11 14:37:49 (fd:2) (pid:17500) gethostname() returned fully qualified name "eeyore.local" 08/01/11 14:37:49 (fd:2) (pid:17500) Trying to initialize local IP address (config file not read) 08/01/11 14:37:49 (fd:2) (pid:17500) NETWORK_INTERFACE=* matches lo 127.0.0.1, wlan0 10.10.30.140, virbr0 192.168.122.1, tun0 10.3.227.184, choosing IP 10.10.30.140 08/01/11 14:37:49 (fd:2) (pid:17500) Trying to initialize local IP address (after reading config) 08/01/11 14:37:49 (fd:2) (pid:17500) Disabling ConvertDefaultIPToSocketIP() because NETWORK_INTERFACE does not match multiple IPs.
Usage: condor_config_val [options] variable [variable] ...

Best,


matt