[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Taking action on job breaching scratch space limit



Hi Vikrant,

I'm using the following for the same purpose

STARTD_JOB_ATTRS = $(STARTD_JOB_ATTRS) RequestDisk
DISK_USAGE_EXCEEDED = (JobUniverse !=13 && DiskUsage =!= UNDEFINED && DiskUsage > RequestDisk)
use POLICY : WANT_HOLD_IF (DISK_USAGE_EXCEEDED , 105, "job exceeded requested disk")

Tomer.



From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of ervikrant06@xxxxxxxxx <ervikrant06@xxxxxxxxx>
Sent: Tuesday, April 6, 2021 5:36 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Taking action on job breaching scratch space limit
 
Hello Experts,

I am trying to take action on the job using more scratch space than allocated.  

CondorDiskSpace attribute is coming from the dynamic script which is called through condor cron. Script is included before putting this configuration. 

CondorDiskSpace = $(CondorDiskSpace:1024)
DiskPerCore =  $(CondorDiskSpace) / $(NUM_CPUS)
chunksdisk = ifThenElse( RequestDisk <= ($(DiskPerCore) * RequestCpus), quantize(RequestCpus, {1}), quantize(RequestDisk, {$(DiskPerCore)}) / $(DiskPerCore))
MODIFY_REQUEST_EXPR_REQUESTDISK = $(chunksdisk) * $(DiskPerCore)

Commented the last two lines in above configuration and added following to keep the test case simple. 

MODIFY_REQUEST_EXPR_REQUESTDISK = $(DiskPerCore) / 160
 STARTD_JOB_ATTRS = $(STARTD_JOB_ATTRS) TowerTeam DiskUsage  

Confirmed from the output of condor_who -l that Disk and DiskUsage appears in output. However, still using scratch space more than allocated, the job keeps on running.  

DiskUsageBreach = (DiskUsage > Disk)
WANT_HOLD = ($(PREEMPT) || $(DiskUsageBreach))

How can we put the job on hold in this scenario once the job uses more disk space than allocated from scratch directory? 

Thanks & Regards,
Vikrant Aggarwal
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.