Greg, Thanks for the response. Here is the issue with "NUM_CPUS". I have attached the partitionable slot configuration that Tom put together for the cluster. I haven't touched this since he moved-on. You can see at the top he put: num_cpus = 2 * $(DETECTED_CPUS) I have no clue as to why this was done, but I suspect it has to do with the partitionable slot configurations in the rest if this file. Which looks to partition the cluster into two partitions, one seems to be dedicated to the 'online_cbc_gstlal_inspiral' analysis and the other for all other jobs. Thus I don't know if I should be changing this setting. Which is one reason I looked into the cgroups and other cpu affinity settings. Tom also set the RAM in this file as well, which is a reason I am investigating cgroups for memory-limiting condor as well as cpu-limiting condor. Sincerely, Shawn On 10/17/19 3:39 PM, Greg Thain wrote: > On 10/17/19 11:39 AM, Shawn A Kwang wrote: >> In Condor (v8.9.1) how do I assign CPU affinity to jobs on the compute >> nodes with 24 cores? Let's say I want to limit condor to using 20 cores: >> 0-19, for users jobs. It should be noted: the cluster is using >> partitionable slots. >> >> Bigger picture: I wish to limit condors resources because the compute >> nodes run alongside the ceph-osd daemons which I want to 'reserve' a >> certain amount of RAM and CPU. > > > Shawn: > > What I would do on this machine is set > > > NUM_CPUS = 20 > > in the htcondor config. > > This will tell htcondor that it only has 20 cores to work with (but not > which physical ones), and condor will only dole out 20 cores worth of > work. With cgroups, if there is contention for all the cores on the > system, the sum of the condor jobs shouldn't exceed 20 cores worth, but > the kernel is free to pick which physical cores to use, leaving the rest > for ceph or other system daemons. > > > -greg > > _______________________________________________ > HTCondor-users mailing list > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a > subject: Unsubscribe > You can also unsubscribe by visiting > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users > > The archives can be found at: > https://lists.cs.wisc.edu/archive/htcondor-users/ -- Associate Scientist Center for Gravitation, Cosmology, and Astrophysics University of Wisconsin-Milwaukee office: +1 414 229 4960 kwangs@xxxxxxx
# # First tell the startd we have double the amount of cpu, memory that we really have, # and advertise some additional information into the slots (such as the amount # of Cpus, Memory leftover in the pslot. # num_cpus = 2 * $(DETECTED_CPUS) memory = 2 * (floor($(DETECTED_MEMORY)/1024) * 1024 - $(RESERVED_MEMORY:0)) startd_attrs = $(startd_attrs) RealtimeSlot preempt want_vacate Realtime_Resources_Inuse startd_slot_attrs = $(startd_slot_attrs) Cpus TotalSlotCpus # Decrease startd polling internal so regular jobs are killed quickly when # realtime jobs arrive. polling_interval = 2 # # Set up a pslot for the realtime jobs, adding a START requirement to prohibit accepting # regular jobs. Give these slot a custom name of "realtimeX@foo", and a custom attribute of # RealtimeSlot=True. # We purposefully use the "==" operator in the Start expression here instead of # the "=?=" operator when testing RealtimeJob so that the realtime1 pslot stays # in unclaimed state instead of owner state. # slot_type_1_partitionable = true slot_type_1 = cpus=50% memory=50% gpus=0% disk=50% swap=0% num_slots_type_1 = 1 slot_type_1_RealtimeSlot = True slot_type_1_name_prefix = realtime slot_type_1_start = ( $(START) ) && online_cbc_gstlal_inspiral # # Set up a pslot for regular jobs. Set the START expression on this slot # to disallow realtime jobs, and only start regular jobs no realtime jobs are running. # Preempt all regular slots if a claim occurs on the realtime slot. # Disable vacate time on these slots, so that jobs are immediately killed # upon preemption (we want the resources freed up asap for the realtime jobs). # slot_type_2_partitionable = true slot_type_2 = cpus=50% memory=50% gpus=50% disk=50% swap=0% num_slots_type_2 = 1 slot_type_1_RealtimeSlot = False Realtime_Resources_Inuse = ( realtime1_Cpus < realtime1_TotalSlotCpus ) slot_type_2_start = ( $(START) ) && !online_cbc_gstlal_inspiral && !$(Realtime_Resources_Inuse) slot_type_2_preempt = $(Realtime_Resources_Inuse) slot_type_2_want_vacate = False
Attachment:
signature.asc
Description: OpenPGP digital signature