[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTcondor disk resource related queries



Hello Experts,

I am testing this configuration to put the jobs on hold breaching the disk limit.Â

STARTD_JOB_ATTRS = $(STARTD_JOB_ATTRS) RequestDisk
DISK_USAGE_EXCEEDED = (JobUniverse =!=13 && DiskUsage =!= UNDEFINED && DiskUsage > RequestDisk)
WANT_HOLD = $(DISK_USAGE_EXCEEDED)
WANT_HOLD_REASON = "Job exceeded disk usage limits"

I clearly see the jobs are using more than RequestDisk size still they are not getting held.Â

# condor_who -af:h globaljobid disk DiskUsage TotalDisk TotalSlotDisk RequestDisk

globaljobid                    Âdisk   DiskUsage TotalDisk ÂTotalSlotDisk     RequestDisk
test.example.com#412.0#1685567906 21356484 8192026 Â 4271296648 21356484.0 Â Â Â Â Â Â16777216 Â
test.example.com#413.0#1685567923 12813890 8192026 Â 4271296648 12813890.0 Â Â Â Â Â Â8388608 Â Â
test.example.com#414.0#1685567952 8542594 Â8192026 Â 4271296648 8542594.0 Â Â Â Â Â Â 3250000 Â Â
test.example.com#415.0#1685568493 8542594 Â8192025 Â 4271296648 8542594.0 Â Â Â Â Â Â 3250000 Â Â
test.example.com#416.0#1685568803 12813890 8192026 Â 4271296648 12813890.0 Â Â Â Â Â Â10000000 Â
test.example.com#417.0#1685568954 4271297 Â8192025 Â 4271296648 4271297.0 Â Â Â Â Â Â 1Â Â

9.0.17 is htcondor version I am using.Â


Thanks & Regards,
Vikrant Aggarwal


On Tue, May 30, 2023 at 1:09âPM Vikrant Aggarwal <ervikrant06@xxxxxxxxx> wrote:
Hello Experts,

Couple of queries:

- Why it's showing negative value for primary partitionable slot.Â

# condor_status `hostname` -server
Name                      OpSys    Arch  LoadAv Memory  Disk   ÂMips  ÂKFlops Â

slot1@xxxxxxxxxxxxxxxxxxxxxxxxxx  LINUX    X86_64 Â0.000  211398 -25210961  25601  1764976
slot1_1@xxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Â Â Â X86_64 Â0.000 Â Â19218 Â 4278313 Â 25601 Â 1764976
slot1_2@xxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Â Â Â X86_64 Â0.000 Â Â19218 Â 4278313 Â 25601 Â 1764976

       ÂMachines Avail ÂMemory    ÂDisk    ÂMIPS   ÂKFLOPS

 X86_64/LINUX    Â3   3   Â249834 18446744073692897281    76803   5294928

    ÂTotal    Â3   3   Â249834 18446744073692897281    76803   5294928


# condor_status -compact `hostname` -af Disk
4269756335


-Â I have this on worker node conf to modify the job request disk to mentioned value but it never worked. We are using similar _expression_ for cpu and memory, it works fine.Â

# condor_config_val MODIFY_REQUEST_EXPR_REQUESTDISK
80000

Not sure from where it's picking this value.Â

# grep -r 'Disk =' /spare/condor/dir_14*/.machine.ad
/spare/condor/dir_1417831/.machine.ad:Disk = 4278313
/spare/condor/dir_1417831/.machine.ad:TotalDisk = 4278312960
/spare/condor/dir_1417831/.machine.ad:TotalSlotDisk = 4278313.0
/spare/condor/dir_1425169/.machine.ad:Disk = 4278313
/spare/condor/dir_1425169/.machine.ad:TotalDisk = 4278312960
/spare/condor/dir_1425169/.machine.ad:TotalSlotDisk = 4278313.0


# du -sh /spare/condor/dir_1425169
3.0G Â Â/spare/condor/dir_1425169

Thanks & Regards,
Vikrant Aggarwal