[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor: Increase requested RAM memory if a job is retried



Hi Roman,

I use this script for exactly the purpose you described
It will relaunch the script with 3 times the memory requested until it reach a cap.
Every relaunch is recorded in a log file.

$ cat /usr/bin/htcondor-release-held-jobs

#!/bin/bash
CAP=524288 # 512GB
MULTIPLIER=3
LOG=/data/dnb01/maintenance/condor_rerun_held_jobs.log

if [ ! -f "$LOG" ]; then
touch "$LOG"
echo "Created $LOG"
fi

for j in $(condor_q -hold -autoformat ClusterId HoldReasonCode| awk '(($2-34) == 0){print $1}'| paste -s -d ' ')
do
  JOB_DESCRIPTION=$(condor_q "$j" -autoformat JobDescription)
MEMORY_PROVISIONED=$(condor_q "$j" -autoformat MemoryProvisioned)
  if [ $(($MEMORY_PROVISIONED * $MULTIPLIER)) -gt $CAP ]; then
    REQUEST_MEMORY=$CAP
  else
    REQUEST_MEMORY=$(($MEMORY_PROVISIONED * $MULTIPLIER))
  fi
REMOTE_HOST=$(condor_q "$j" -autoformat LastRemoteHost|cut -f2 -d@|cut -f1 -d.)

  DATE_WITH_TIME=$(date "+%d/%m/%Y-%H:%M:%S")
  /bin/cat <<EOM >>$LOG
$DATE_WITH_TIME, rerunning held job, id $j, description $JOB_DESCRIPTION, memory_provisioned $MEMORY_PROVISIONED, request_memory $REQUEST_MEMORY, $REMOTE_HOST
EOM

  condor_qedit "$j" RequestMemory=$REQUEST_MEMORY
  condor_release "$j"
done

Hope it helps,
Gianmauro


On 3/2/22 19:48, romain.bouquet04@xxxxxxxxx wrote:
Dear all,

I have jobs that I set to be retried automatically by condor in case of failure. I was wondering if there is a way for condor to automatically increase the requested RAM for a job in case it failed and it is retried.

I was looking at the NumJobStarts which counts the number of times a job is started https://htcondor.readthedocs.io/en/latest/classad-attributes/job-classad-attributes.html <https://htcondor.readthedocs.io/en/latest/classad-attributes/job-classad-attributes.html>||

And I was trying to add something as below in the submit file (but it does not work): (based on https://htcondor.readthedocs.io/en/latest/users-manual/submitting-a-job.html#using-conditionals-in-the-submit-description-file <https://htcondor.readthedocs.io/en/latest/users-manual/submitting-a-job.html#using-conditionals-in-the-submit-description-file>)

if NumJobStarts == 0
 ÂÂ request_memory = 2GB
else
 Â request_memory = 8GB
endif

I could use requirement with a syntax like
requirement = (NumJobStarts == 0 &&ÂTARGET.Memory >= 2GB) || (NumJobStarts >= 1 &&ÂTARGET.Memory >= 8GB)
But apparently it is not recommended to request memory that way

Would anyone have a better solution?

Many thanks in advance
Best,
Romain Bouquet
||

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

--
Gianmauro Cuccuru

UseGalaxy.eu
Bioinformatics Group
Department of Computer Science
Albert-Ludwigs-University Freiburg
Georges-KÃhler-Allee 106
79110 Freiburg, Germany