[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] preemption problems on particular execution nodes



Hi from me as well,

just to re-iterate on what Henning already wrote.

The test pool has only got two execution hosts with only slightly
differing outputs of condor_config_val -dump[1].

After submission of many single core jobs, they occupy all available
slots. Then, we submit a large number of 12 core jobs which only get
scheduled to a single host, preempting multiple jobs from there but then
everything becomes static:

$ condor_q
-- Schedd: condor3.atlas.local : <10.20.30.18:9618?... @ 11/06/19 14:59:24
OWNER   BATCH_NAME    SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS
carsten ID: 151     11/6  14:51      _    136     64    200 151.0-199
carsten ID: 153     11/6  14:51      _      2    198    200 153.0-199

$ condor_q -run
[..]
151.157 carsten        11/6  14:51   0+00:08:30 slot1@xxxxxxxxxxxxxxxxx
151.158 carsten        11/6  14:51   0+00:08:30 slot1@xxxxxxxxxxxxxxxxx
151.159 carsten        11/6  14:51   0+00:08:30 slot1@xxxxxxxxxxxxxxxxx
153.0   carsten        11/6  14:51   0+00:05:51 slot1@xxxxxxxxxxxxxxxxx
153.1   carsten        11/6  14:51   0+00:04:51 slot1@xxxxxxxxxxxxxxxxx

The Negotiator log looks fine for the first two jobs:

11/06/19 14:53:54 Matched pslot slot1@xxxxxxxxxxxxxxxxx by priority
preempting 12 dynamic slots
11/06/19 14:53:54       Preempting various_dSlot_users (user
prio=500.00, startd rank=0.00) on slot1@xxxxxxxxxxxxxxxxx for
aei.dev.admin.multi.carsten@xxxxxxx
ocal (user prio=2.54, startd rank=0.00)
11/06/19 14:53:54       Sending PERMISSION, claim id, startdAd to schedd
11/06/19 14:53:54       Notifying the accountant
11/06/19 14:53:54       Successfully matched with slot1@xxxxxxxxxxxxxxxxx
11/06/19 14:53:54 Match completed, match cost= 12
11/06/19 14:53:54     Request 00153.00000: autocluster 6 (request count
2 of 200)
11/06/19 14:53:54 matchmakingAlgorithm: limit 159.999575 used 12.000000
pieLeft 147.999575
11/06/19 14:53:54 Attempting to use cached MatchList: Failed (MatchList
length: 0, Autocluster: 6, Submitter Name:
aei.dev.admin.multi.carsten@xxxxxxxxxxx, Sc
hedd Address:
<10.20.30.18:9618?addrs=10.20.30.18-9618&noUDP&sock=756250_2d21_3>)
11/06/19 14:53:54     Send END_NEGOTIATE to remote schedd
11/06/19 14:53:54   Submitter aei.dev.admin.multi.carsten@xxxxxxxxxxx
got all it wants; removing it.


However, later on for 153.2 looks fine initially

11/06/19 14:59:54 Socket to aei.dev.admin.multi.carsten@xxxxxxxxxxx
(<10.20.30.18:9618?addrs=10.20.30.18-9618&noUDP&sock=756250_2d21_3>)
already in cache, reu
sing
11/06/19 14:59:54 Started NEGOTIATE with remote schedd; protocol version 1.
11/06/19 14:59:54     Request 00153.00002: autocluster 6 (request count
1 of 198)
11/06/19 14:59:54 matchmakingAlgorithm: limit 135.999583 used 0.000000
pieLeft 135.999583

evaluation for PREEMPTION_REQUIREMENT is also good. e.g.

11/06/19 14:59:54 Classad debug: 1573051884 --> 1573051884
11/06/19 14:59:54 Classad debug: [0.01597ms] JobStart --> 1573051884
11/06/19 14:59:54 Classad debug: time() --> 1573052394
11/06/19 14:59:54 Classad debug: 1573051884 --> 1573051884
11/06/19 14:59:54 Classad debug: [0.00906ms] JobStart --> 1573051884
11/06/19 14:59:54 Classad debug: [0.05794ms] ifThenElse(JobStart isnt
undefined,(time() - JobStart),0) --> 510
11/06/19 14:59:54 Classad debug: [0.00119ms] RemoteUserPrio --> 995387
11/06/19 14:59:54 Classad debug: [0.00095ms] SubmittorPrio --> 2.59673
11/06/19 14:59:54 Classad debug: [0.09704ms] ifThenElse(JobStart isnt
undefined,(time() - JobStart),0) > (2 * 60) && (RemoteUserPrio >
SubmittorPrio * 1.200000000000000E+00) --> TRUE

However, it sadly ends with:

11/06/19 14:59:54     Send END_NEGOTIATE to remote schedd
11/06/19 14:59:54   Submitter aei.dev.admin.multi.carsten@xxxxxxxxxxx
got all it wants; removing it.

Thus, two questions:

(1) Which is hopefully simple to answer - is there a way to speed up
preemption? Right now, only a single preemption occurs in each
negotiation cycle.

(2) Any idea, why the jobs on the second node are never preempted?

Cheers

Carsten

[1] Differences mostly do the the fact a3001 contains a few GPUs

$ diff /tmp/a30*config

1c1
< # Configuration from machine: a3001.atlas.local
---
> # Configuration from machine: a3010.atlas.local
85c85
< CENTRAL_MANAGER = condorhub
---
> CENTRAL_MANAGER = condorhub.atlas.local
277,280c277,280
< DETECTED_CORES = 32
< DETECTED_CPUS = 32
< DETECTED_MEMORY = 192081
< DETECTED_PHYSICAL_CPUS = 16
---
> DETECTED_CORES = 128
> DETECTED_CPUS = 128
> DETECTED_MEMORY = 515889
> DETECTED_PHYSICAL_CPUS = 64
315,316d314
< ENVIRONMENT_FOR_AssignedGPUs = CUDA_VISIBLE_DEVICES
< ENVIRONMENT_VALUE_FOR_UnAssignedGPUs = none
343c341
< FULL_HOSTNAME = a3001.atlas.local
---
> FULL_HOSTNAME = a3010.atlas.local
374d371
< GPU_DISCOVERY_EXTRA = -extra
442c439
< HOSTNAME = a3001
---
> HOSTNAME = a3010
458c455
< IP_ADDRESS = 10.10.30.1
---
> IP_ADDRESS = 10.10.30.10
460c457
< IPV4_ADDRESS = 10.10.30.1
---
> IPV4_ADDRESS = 10.10.30.10
538d533
< MACHINE_RESOURCE_INVENTORY_GPUs =
/usr/share/condor/condor_gpu_discovery_wrapper
$(LIBEXEC)/condor_gpu_discovery -properties $(GPU_DISCOVERY_EXTRA)
677c672
< NETWORK_INTERFACE = 10.10.30.1
---
> NETWORK_INTERFACE = 10.10.30.10
714c709
< PID = 5054
---
> PID = 106453
724c719
< PPID = 5046
---
> PPID = 106445
865c860
< SHUTDOWN_GRACEFUL_TIMEOUT = 16000
---
> SHUTDOWN_GRACEFUL_TIMEOUT = 600
870c865
< SLOT_TYPE_1 = ram=171857, swap=0%, cpus=100%
---
> SLOT_TYPE_1 = ram=438065, swap=0%, cpus=100%
900,904c895
< STARTD_CRON_GPUs_MONITOR_EXECUTABLE = $(LIBEXEC)/condor_gpu_utilization
< STARTD_CRON_GPUs_MONITOR_METRICS = SUM:GPUs, PEAK:GPUsMemory
< STARTD_CRON_GPUs_MONITOR_MODE = WaitForExit
< STARTD_CRON_GPUs_MONITOR_PERIOD = 1
< STARTD_CRON_JOBLIST =  FACTER SIMD GPUs_MONITOR
---
> STARTD_CRON_JOBLIST =  FACTER SIMD
1019c1010
< USER_JOB_WRAPPER = $(LOCAL_CONDOR_SCRIPTS)/user-job-wrapper.sh
---
> USER_JOB_WRAPPER =
1023c1014
< UTSNAME_NODENAME = a3001
---
> UTSNAME_NODENAME = a3010
1077,1078d1067
< #     /etc/condor/config.d/20_gpu
< #     /etc/condor/config.d/20_preemption

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature