[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] question on late materialization shadows



Hi all,

I have a quick question on how shadows are instantiated for late materialization jobs. Thing is, that at first I got confused by an user's jobs, for which I noticed >100 shadow starts (and exits) in the access point's/scheduler's {Sched,Shadow}Log. Initially, I interpreted the log messages like [1], that the shadow was going to be brokered to the logged execution point/worker but exited with `JOB_EXCEPTION` (without an actual job being instantiate on the worker). But according to the negotiator all these transient shadows had no match, so also no events logged on the EPs' Star(er)Logs. Since these jobs were max_materialize jobs, I guess that the shadows were all "virtual" shadows until the final matching succeeded and the shadow and the job actual became real, or?

A corollary question would be, if one could somehow differentiated between "virtual" shadows and "real" shadows from multiple job runs?

I.e., on the scheds we add a few additional (execution point) ads to the jobs like [2] where the idea is to include benchmark performance info in the job. But since each "virtual" shadow without actual realization also "inherits" the EP ads from the transient match, these "virtual" shadow/worker details are piling up in the extended job ads [3].

Cheers,
  Thomas


[1]
08/08/23 11:08:34 (pid:2305) match (slot2@xxxxxxxxxxxxxxxxx <131.169.164.78:35712?addrs=131.169.164.78-35712+[2001-638-700-10a0--1-44e]-35712&alias=batch1378.desy.de> for BIRD_atlas.lite.tadej) switching to job 19407249.2850 08/08/23 11:08:34 (pid:2305) Shadow pid 3161435 switching to job 19407249.2850.
08/08/23 11:08:34 (pid:2305) Starting add_shadow_birthdate(19407249.2850)
08/08/23 11:08:34 (pid:2305) Shadow pid 3161435 for job 19407249.2850 exited with status 4

[2]
JobMachineSpecAttrs = $(JobMachineSpecAttrs) HS06 HS06PerSlot HS06perWatt ApelScaledPerSlot ClusterAvgCoreHS06 SYSTEM_JOB_MACHINE_ATTRS = $(SYSTEM_JOB_MACHINE_ATTRS) $(JobMachineSpecAttrs)
SUBMIT_ATTRS = $(SUBMIT_ATTRS) $(JobMachineSpecAttrs)

SYSTEM_JOB_MACHINE_ATTRS_HISTORY_LENGTH = 5

[3]
MachineAttrHS06perWatt0 = 1.79
MachineAttrHS06perWatt1 = 3.7
MachineAttrHS06perWatt2 = 3.7
MachineAttrHS06perWatt3 = 3.7
MachineAttrHS06perWatt4 = 3.83
MachineAttrHS06perWatt5 = 3.7
MachineAttrHS06perWatt6 = 3.7
MachineAttrHS06perWatt7 = 3.83
MachineAttrHS06perWatt8 = 2.06
MachineAttrHS06perWatt9 = 3.7

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature