[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Parallel jobs do not start if I set NUM_SLOTS_TYPE_1 setting



Everything works fine, but if I add to my EXECUTE configuration only the following line:

NUM_SLOTS_TYPE_1 = $(NUM_CPUS)

thenÂparallel tasks never run!ÂWhy is this happening?

So I just want to divide my slots into two parts, that the first part works with one DedicatedScheduler, and the second part works with another DedicatedScheduler.
I found a workaround:
SLOT1_DedicatedScheduler="DedicatedScheduler@schedd-30040@submit.pseven-htcondor"
SLOT2_DedicatedScheduler="DedicatedScheduler@schedd-30040@submit.pseven-htcondor"
SLOT3_DedicatedScheduler="DedicatedScheduler@schedd-30041@submit.pseven-htcondor"
SLOT4_DedicatedScheduler="DedicatedScheduler@schedd-30041@submit.pseven-htcondor"
SLOT5_DedicatedScheduler....
...
but it is too long and very inconvenient in the point of configuring my nodes.

So I would want to use NUM_SLOTS_TYPE_1 and NUM_SLOTS_TYPE_2 But I canât divide the slots into types, because then parallel tasks do not work!ÂIn my opinion, thisÂis like a bug, or what I donât understand?
I launched only one parallel task, consisting of one subtask! Nevertheless, she ate all the slots:

root@pseven-htcondorsubmit-65c66787fb-szkgk:/# condor_q -better-analyze -verbose -allusers
Fetching Machine ads... 16 ads.
Fetching job ads... 1 ads


-- Schedd: schedd-30040@xxxxxxxxxxxxxxxxxxxxxx : <172.17.0.8:43903?...
The Requirements _expression_ for job 1.000 is

  ((OpSys == "LINUX") && (Arch == "X86_64") && (RUNENV_PYTHON3 >= 6)) && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && (TARGET.FileSystemDomain == MY.FileSystemDomain)

Job 1.000 defines the following attributes:

  DiskUsage = 1
  FileSystemDomain = "pseven-htcondor"
  ImageSize = 1
  RequestDisk = DiskUsage
  RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,(ImageSize + 1023) / 1024)

The Requirements _expression_ for job 1.000 reduces to these conditions:

    ÂSlots
Step  ÂMatched ÂCondition
----- Â-------- Â---------
[0] Â Â Â Â Â16 ÂOpSys == "LINUX"
[1] Â Â Â Â Â16 ÂArch == "X86_64"
[3] Â Â Â Â Â16 ÂRUNENV_PYTHON3 >= 6
[5] Â Â Â Â Â16 ÂTARGET.Disk >= RequestDisk
[7] Â Â Â Â Â16 ÂTARGET.Memory >= RequestMemory
[9] Â Â Â Â Â16 ÂTARGET.FileSystemDomain == MY.FileSystemDomain

No successful match recorded.
Last failed match: Fri Feb Â7 16:37:52 2020

Reason for last match failure: PREEMPTION_REQUIREMENTS == False

001.000: ÂRun analysis summary ignoring user priority. Of 16 machines,
   0 are rejected by your job's requirements
   0 reject your job because of their own requirements
  Â16 match and are already running your jobs
   0 match but are serving other users
   0 are able to run your job

*******************************************************

My settings for startd is:
NUM_CPUS = 16
DedicatedScheduler = "DedicatedScheduler@schedd-30040@submit.pseven-htcondor"
STARTD_ATTRS = $(STARTD_ATTRS), DedicatedScheduler
DAEMON_LIST = MASTER, STARTD
# NUM_SLOTS_TYPE_1 = $(NUM_CPUS) <--- this line is a problem.

root@pseven-htcondorsubmit-65c66787fb-szkgk:/# condor_version
$CondorVersion: 8.9.2 Jun 04 2019 BuildID: Debian-8.9.2-1 PackageID: 8.9.2-1 Debian-8.9.2-1 $
$CondorPlatform: X86_64-Ubuntu_18.04 $

--
Sincerely yours,
Ivan Ergunov                         mailto:hozblok@xxxxxxxxx