[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] question on late materialization shadows



Hi Thomas,
There is nothing special about the Shadow for late materialization jobs.  In fact the Negotiator does not even make matches for jobs until after they are materialized.

Really, the only difference between regular submit and late materialization is that with regular submit, the jobs are materialized by condor_submit before they are submitted to the Schedd. While with late materialization, the jobs are materialized by the Schedd after.  In either case, only job that have been materialized are considered for matchmaking.  There are no virtual shadows.

-tj 

-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Thomas Hartmann
Sent: Tuesday, August 8, 2023 5:59 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] question on late materialization shadows

Hi all,

I have a quick question on how shadows are instantiated for late 
materialization jobs.
Thing is, that at first I got confused by an user's jobs, for which I 
noticed >100 shadow starts (and exits) in the access point's/scheduler's 
{Sched,Shadow}Log. Initially, I interpreted the log messages like [1], 
that the shadow was going to be brokered to the logged execution 
point/worker but exited with `JOB_EXCEPTION` (without an actual job 
being instantiate on the worker).
But according to the negotiator all these transient shadows had no 
match, so also no events logged on the EPs' Star(er)Logs. Since these 
jobs were max_materialize jobs, I guess that the shadows were all 
"virtual" shadows until the final matching succeeded and the shadow and 
the job actual became real, or?

A corollary question would be, if one could somehow differentiated 
between "virtual" shadows and "real" shadows from multiple job runs?

I.e., on the scheds we add a few additional (execution point) ads to the 
jobs like [2] where the idea is to include benchmark performance info in 
the job.
But since each "virtual" shadow without actual realization also 
"inherits" the EP ads from the transient match, these "virtual" 
shadow/worker details are piling up in the extended job ads [3].

Cheers,
   Thomas


[1]
08/08/23 11:08:34 (pid:2305) match (slot2@xxxxxxxxxxxxxxxxx 
<131.169.164.78:35712?addrs=131.169.164.78-35712+[2001-638-700-10a0--1-44e]-35712&alias=batch1378.desy.de> 
for BIRD_atlas.lite.tadej) switching to job 19407249.2850
08/08/23 11:08:34 (pid:2305) Shadow pid 3161435 switching to job 
19407249.2850.
08/08/23 11:08:34 (pid:2305) Starting add_shadow_birthdate(19407249.2850)
08/08/23 11:08:34 (pid:2305) Shadow pid 3161435 for job 19407249.2850 
exited with status 4

[2]
JobMachineSpecAttrs = $(JobMachineSpecAttrs) HS06 HS06PerSlot 
HS06perWatt ApelScaledPerSlot ClusterAvgCoreHS06
SYSTEM_JOB_MACHINE_ATTRS = $(SYSTEM_JOB_MACHINE_ATTRS) 
$(JobMachineSpecAttrs)
SUBMIT_ATTRS = $(SUBMIT_ATTRS) $(JobMachineSpecAttrs)

SYSTEM_JOB_MACHINE_ATTRS_HISTORY_LENGTH = 5

[3]
MachineAttrHS06perWatt0 = 1.79
MachineAttrHS06perWatt1 = 3.7
MachineAttrHS06perWatt2 = 3.7
MachineAttrHS06perWatt3 = 3.7
MachineAttrHS06perWatt4 = 3.83
MachineAttrHS06perWatt5 = 3.7
MachineAttrHS06perWatt6 = 3.7
MachineAttrHS06perWatt7 = 3.83
MachineAttrHS06perWatt8 = 2.06
MachineAttrHS06perWatt9 = 3.7