[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] request_memory



On 12/28/2013 8:15 AM, Rita wrote:
This works.

Glad to hear it!

This makes perfect sense and well written I recommend we put this in the
Wiki.


Sure, I added a link to this recipe onto the wiki page
https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToLimitMemoryUsage

Todd


On Fri, Dec 27, 2013 at 5:11 PM, Todd Tannenbaum <tannenba@xxxxxxxxxxx>wrote:

On 12/26/2013 3:57 PM, Rita wrote:

I have a job with request_memory = 256 Megabytes but if I go over it will
get held according to my SYSTEM_PERIODIC_HOLD policy. I would like to
automatically triple the request_memory and then release the job. Is that
possible to do?


Warning - this is off the top of my head, but I think the following would
work or at least point you in the right direction.   Also I am assuming you
are using HTCondor 8.0 or above.

I would expect your condor_config file has entries like the following:

   # Tell the schedd to hold jobs that have been restarted more than
   # 10 times, or if it uses more memory than it requested
   SYSTEM_PERIODIC_HOLD = JobRunCount > 10 || MemoryUsage > RequestMemory

   # Set a specific HoldReasonSubCode so if the job uses
   # more memory than request, so we can identify this particular
   # hold reason in automatic release expressions.  With the below
   # expression, the job's HoldReasonSubCode will be 1 if
   # the job is on hold due to memory issues, and 0 otherwise.
   SYSTEM_PERIODIC_HOLD_SUBCODE = MemoryUsage > RequestMemory

Then in your job submit file you could try something like:

  executable = foo
  # HoldReasonSubCode of 26 means SYSTEM_PERIODIC_HOLD became true,
  # and HoldReasonSubCode of 1 is configured in condor_config to mean
  # job was put on hold due to memory usage.
  periodic_release = HoldReasonCode =?= 26 && HoldReasonSubCode =?= 1
  # Request 256 MB ram unless MemoryUsage (from a previous run) is
  # already defined, in which case triple it for the next try.
  request_memory = ifthenelse(isUndefined(MemoryUsage),256,3*MemoryUsage)
  queue

Hope the above makes sense - the manual has documentation on all the above
knobs and job classad attributes.

Note that if you have control of the configuration of the execute
machines, consider having the execute machine itself put the job on hold if
memory usage exceeds requested memory.  This way the job will be put on
hold much sooner after memory usage is exceeded (perhaps even instantly).
  For some HOWTO recipes, see
  https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToLimitMemoryUsage

regards,
Todd



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/






_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685