[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor: Increase requested RAM memory if a job is retried



HI again Gianmauro,

Thanks I don't think for my jobs that run for a long time it would be a "solution" as I don't want a cron process to run in parallel.
But thanks a lot anyway for your answers! It is much appreciated to propose that solution.

Best,
Romain

LeÂjeu. 3 mars 2022 ÃÂ11:06, <gmauro@xxxxxxxxxxxxxxxxxxxxxxxxxx> a ÃcritÂ:
I have a cron job that run the script every 5 minutes.
It works fine for us.

Gianmauro

On 3/3/22 11:01, romain.bouquet04@xxxxxxxxx wrote:
> Hi Gianmauro,
>
> Thanks for your answer but from what I understand you launch this script
> manually right ?
> What I would like is finding a way for condor to increase the memory
> itself as my jobs are retried automatically.
>
> Best,
> Romain
>
> LeÂmer. 2 mars 2022 ÃÂ20:12, <gmauro@xxxxxxxxxxxxxxxxxxxxxxxxxx
> <mailto:gmauro@xxxxxxxxxxxxxxxxxxxxxxxxxx>> a ÃcritÂ:
>
>Â Â ÂHi Roman,
>
>Â Â ÂI use this script for exactly the purpose you described
>Â Â ÂIt will relaunch the script with 3 times the memory requested until it
>Â Â Âreach a cap.
>Â Â ÂEvery relaunch is recorded in a log file.
>
>Â Â Â$ cat /usr/bin/htcondor-release-held-jobs
>
>Â Â Â#!/bin/bash
>Â Â ÂCAP=524288 # 512GB
>Â Â ÂMULTIPLIER=3
>Â Â ÂLOG=/data/dnb01/maintenance/condor_rerun_held_jobs.log
>
>Â Â Âif [ ! -f "$LOG" ]; then
>Â Â Âtouch "$LOG"
>Â Â Âecho "Created $LOG"
>Â Â Âfi
>
>Â Â Âfor j in $(condor_q -hold -autoformat ClusterId HoldReasonCode| awk
>Â Â Â'(($2-34) == 0){print $1}'| paste -s -d ' ')
>Â Â Âdo
>Â Â Â Â ÂJOB_DESCRIPTION=$(condor_q "$j" -autoformat JobDescription)
>Â Â Â Â ÂMEMORY_PROVISIONED=$(condor_q "$j" -autoformat MemoryProvisioned)
>
>Â Â Â Â Âif [ $(($MEMORY_PROVISIONED * $MULTIPLIER)) -gt $CAP ]; then
>Â Â Â Â Â ÂREQUEST_MEMORY=$CAP
>Â Â Â Â Âelse
>Â Â Â Â Â ÂREQUEST_MEMORY=$(($MEMORY_PROVISIONED * $MULTIPLIER))
>Â Â Â Â Âfi
>Â Â Â Â ÂREMOTE_HOST=$(condor_q "$j" -autoformat LastRemoteHost|cut -f2
>Â Â Â-d@|cut -f1 -d.)
>
>Â Â Â Â ÂDATE_WITH_TIME=$(date "+%d/%m/%Y-%H:%M:%S")
>Â Â Â Â Â/bin/cat <<EOM >>$LOG
>Â Â Â Â Â$DATE_WITH_TIME, rerunning held job, id $j, description
>Â Â Â$JOB_DESCRIPTION, memory_provisioned $MEMORY_PROVISIONED,
>Â Â Ârequest_memory
>Â Â Â$REQUEST_MEMORY, $REMOTE_HOST
>Â Â ÂEOM
>
>Â Â Â Â Âcondor_qedit "$j" RequestMemory=$REQUEST_MEMORY
>Â Â Â Â Âcondor_release "$j"
>Â Â Âdone
>
>Â Â ÂHope it helps,
>Â Â ÂGianmauro
>
>
>Â Â ÂOn 3/2/22 19:48, romain.bouquet04@xxxxxxxxx
>Â Â Â<mailto:romain.bouquet04@xxxxxxxxx> wrote:
>Â Â Â > Dear all,
>Â Â Â >
>Â Â Â > I have jobs that I set to be retried automatically by condor in
>Â Â Âcase of
>Â Â Â > failure.
>Â Â Â > I was wondering if there is a way for condor to automatically
>Â Â Âincrease
>Â Â Â > the requested RAM for a job in case it failed and it is retried.
>Â Â Â >
>Â Â Â > I was looking at the NumJobStarts which counts the number of
>Â Â Âtimes a job
>Â Â Â > is started
>Â Â Â >
>Â Â Âhttps://htcondor.readthedocs.io/en/latest/classad-attributes/job-classad-attributes.html
>Â Â Â<https://htcondor.readthedocs.io/en/latest/classad-attributes/job-classad-attributes.html>
>
>Â Â Â >
>Â Â Â<https://htcondor.readthedocs.io/en/latest/classad-attributes/job-classad-attributes.html
>Â Â Â<https://htcondor.readthedocs.io/en/latest/classad-attributes/job-classad-attributes.html>>||
>Â Â Â >
>Â Â Â > And I was trying to add something as below in the submit file
>Â Â Â(but it
>Â Â Â > does not work):
>Â Â Â > (based on
>Â Â Â >
>Â Â Âhttps://htcondor.readthedocs.io/en/latest/users-manual/submitting-a-job.html#using-conditionals-in-the-submit-description-file
>Â Â Â<https://htcondor.readthedocs.io/en/latest/users-manual/submitting-a-job.html#using-conditionals-in-the-submit-description-file>
>
>Â Â Â >
>Â Â Â<https://htcondor.readthedocs.io/en/latest/users-manual/submitting-a-job.html#using-conditionals-in-the-submit-description-file
>Â Â Â<https://htcondor.readthedocs.io/en/latest/users-manual/submitting-a-job.html#using-conditionals-in-the-submit-description-file>>)
>
>Â Â Â >
>Â Â Â >
>Â Â Â > if NumJobStarts == 0
>Â Â Â >Â ÂÂ request_memory = 2GB
>Â Â Â > else
>Â Â Â >Â Â request_memory = 8GB
>Â Â Â > endif
>Â Â Â >
>Â Â Â > I could use requirement with a syntax like
>Â Â Â > requirement = (NumJobStarts == 0 &&ÂTARGET.Memory >= 2GB) ||
>Â Â Â > (NumJobStarts >= 1 &&ÂTARGET.Memory >= 8GB)
>Â Â Â > But apparently it is not recommended to request memory that way
>Â Â Â >
>Â Â Â > Would anyone have a better solution?
>Â Â Â >
>Â Â Â > Many thanks in advance
>Â Â Â > Best,
>Â Â Â > Romain Bouquet
>Â Â Â > ||
>Â Â Â >
>Â Â Â > _______________________________________________
>Â Â Â > HTCondor-users mailing list
>Â Â Â > To unsubscribe, send a message to
>Â Â Âhtcondor-users-request@xxxxxxxxxxx
>Â Â Â<mailto:htcondor-users-request@xxxxxxxxxxx> with a
>Â Â Â > subject: Unsubscribe
>Â Â Â > You can also unsubscribe by visiting
>Â Â Â > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>Â Â Â<https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users>
>Â Â Â >
>Â Â Â > The archives can be found at:
>Â Â Â > https://lists.cs.wisc.edu/archive/htcondor-users/
>Â Â Â<https://lists.cs.wisc.edu/archive/htcondor-users/>
>
>Â Â Â--
>Â Â ÂGianmauro Cuccuru
>
>Â Â ÂUseGalaxy.eu
>Â Â ÂBioinformatics Group
>Â Â ÂDepartment of Computer Science
>Â Â ÂAlbert-Ludwigs-University Freiburg
>Â Â ÂGeorges-KÃhler-Allee 106
>Â Â Â79110 Freiburg, Germany
>Â Â Â_______________________________________________
>Â Â ÂHTCondor-users mailing list
>Â Â ÂTo unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>Â Â Â<mailto:htcondor-users-request@xxxxxxxxxxx> with a
>Â Â Âsubject: Unsubscribe
>Â Â ÂYou can also unsubscribe by visiting
>Â Â Âhttps://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>Â Â Â<https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users>
>
>Â Â ÂThe archives can be found at:
>Â Â Âhttps://lists.cs.wisc.edu/archive/htcondor-users/
>Â Â Â<https://lists.cs.wisc.edu/archive/htcondor-users/>
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

--
Gianmauro Cuccuru

UseGalaxy.eu
Bioinformatics Group
Department of Computer Science
Albert-Ludwigs-University Freiburg
Georges-KÃhler-Allee 106
79110 Freiburg, Germany
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/