[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Detecting available CPUs instead of hardware CPUs



Thank you Greg,
this is what I needed.
I did some tests and ended up using your command and falling back on parsing _CONDOR_MACHINE_AD on my own because “condor_status -ads” does not work with 8.2 and "condor_status -target” is only almost working in 8.2.
I try first:
cores=`condor_status -ads "$_CONDOR_MACHINE_AD" -af Cpus 2>/dev/null`
and then:
cores=`egrep "^Cpus " "$_CONDOR_MACHINE_AD" | awk '{print $3}’`

Checking cgroups (and /cgroup/htcondor/condor_var_lib_condor_execute_slot1@execnode1/cpu.shares files) as suggested by Michael may work for a more general solution where the job slots are provided also by other job managers. I’ll keep in mind for the future, for now I’m happy with the machine ad.

Thanks,
Marco

PS 
Michael this is an actual production system. In GlideinWMS we create an HTCondor overlay on heterogeneous resources provisioned on systems we do not have control on (including HTCondor clusters where we can only submit jobs and not ask to reconfigure the startd to join our pool). 


On Jan 13, 2016, at 9:55 AM, Greg Thain <gthain@xxxxxxxxxxx> wrote:

> On 01/12/2016 05:16 PM, Marco Mambelli wrote:
>> Hi,
>> if I start condor in a condor slot and do not set NUM_CPUS, then condor (the starting one) is detecting the hardware CPUs, not the ones that the slot is providing to it (and not the requested ones in request_cpus).
>> E.g. in a node with 8 cpus and 2 equal slots (4 cpus each), condor starting in one of the slots thinks to have 8 cpus, even if the job that is starting the new condor had request_cpus=4.
>> 
>> I have 2 questions:
>> 1. I observed this in 8.4, is it the desired behavior in all versions?
>> 2. If I want to manually change the NUM_CPUS in the configuration of the condor that I’m starting within the slot, which is the best way to detect the CPUS available (something that works for static and dynamic slots)?
>> 
> 
> Marco:
> 
> Every HTCondor job has a copy of the machine ad, as it existed when the job started, written to the sandbox directory.  The environment variable _CONDOR_MACHINE_AD points to this file.  This ad will have an attribute "Cpus", set to the number of provisioned Cpus by the base HTCondor.  If your glidein script can parse this file, and set NUM_CPUS for the glided-in HTCondor, I think you'll have what you need.  This will work for static and partitionable slots.  I think you can even use condor_status to parse the file with:
> 
> condor_status -ads $_CONDOR_MACHINE_AD -af Cpus
> 
> -greg
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/