[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] limit total memory usage of jobs



Hi Larry,
ÂI had that problem sometime ago, and this is how I did it in the Startd:

1. Define a minimun of MEM to be used when the jobs don't use request_mem.
# Minimun Memory when job don't request any / Minimo de Memoria RAM cuando la tarea no solicita
JOB_DEFAULT_REQUESTMEMORY=256
MODIFY_REQUEST_EXPR_REQUESTMEMORY=quantize(RequestMemory, {256})

2. the Startd checks the memory used by the Jobs
# Check Memory used by the job / Verificar memoria usada por la tarea
MEMORY_EXCEEDED=((MemoryUsage*1.1 > Memory) =?= TRUE)

3. If the Jobs exceed the memory allowed, Hold it.
# If Memory Exceded, Evict job / Si se excede la memoria, cancelar la tarea
PREEMPT=($(PREEMPT)) || $(MEMORY_EXCEEDED)
WANT_SUSPEND=$(WANT_SUSPEND) && $(MEMORY_EXCEEDED)
WANT_HOLD=$(MEMORY_EXCEEDED)

4. Set a Hold Reason so the user knows why the Jobs is Holded.
# Message to Job's owner / Mensaje para el propietario del Job.
WANT_HOLD_REASON=ifThenElse( $(WANT_HOLD),"Job exceeded available memory. La tarea excedio la memoria disponible.",undefined )

I've been using this configuration and works fine in our pool.
I hope this helps you.

Bye

On Sat, Feb 24, 2018 at 2:21 PM, Larry Martell <larry.martell@xxxxxxxxx> wrote:
Didn't get any replies here, so I asked on Stack Overflow. On there
someone said:

Use a locally evaluated START policy _expression_ that mixes the
machine's current state (from its ClassAds) with the max memory macro
to test if the currently available RAM is x% of the total and evaluate
START to False if so.

With respect to that I have 2 questions:

1) What in the ClassAd shows the current amount of RAM used or
available? When I look at the ClassAd while jobs are running I do not
see any values related to the memory changing - they always seem to
show that total RAM.

2) Assuming there is such a value in the ClassAd that give that, how
do I reference it in the policy _expression_?

Thanks in advance for any help anyone can provide with this.

On Fri, Feb 23, 2018 at 11:32 AM, Larry Martell <larry.martell@xxxxxxxxx> wrote:
> I have an execute host with 132 slots and condor will happily run 132
> jobs there. But depending on the jobs those 132 can use all the RAM
> and cause swapping and eventually trashing. How can I set a config
> option that says, 'do not run jobs if the RAM used is more then nnGB'?
>
> I have read https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToLimitMemoryUsage
> but it does not say how to do this.
>
> When I say RAM used I am taking about the number I see in the used
> column in free:
>
> $ free -mh
>           Âtotal    used    free   shared
> buff/cache Âavailable
> Mem:Â Â Â Â Â Â125GÂ Â Â Â Â44GÂ Â Â Â Â18GÂ Â Â Â Â19MÂ Â Â Â Â63GÂ Â Â Â Â79G
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--