[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Limit Memory



On Aug 28, 2013, at 7:18 AM, Romain <nuelromain@xxxxxxxxx> wrote:

> Romain <nuelromain@...> writes:
> 
>> 
>> Brian Bockelman <bbockelm <at> ...> writes:
>> 
>>> 
>>> 
>>> On Aug 27, 2013, at 9:22 AM, Romain <nuelromain <at> ...> wrote:
>>> 
>>>> Hi everybody,
>>>> 
>>>> I've some problems with the limits of memory usage on my pool.
>>>> 
>>>> So I've install cgroup and configure like that:
>>>> BASE_CGROUP = htcondor
>>>> CGROUP_MEMORY_LIMIT_POLICY = hard
>>>> On my configuration files (condor_config)
>>>> 
>>>> I want to suspend the jobs if it stay at the limit for a time (1 min 
> for 
>>>> example) and go back to the queue if it stay another time more (5 min 
>> for 
>>>> example)
>>>> 
>>> 
>>> I don't understand the question.  The memory limits are per-job.  If you 
>> suspend the job, how is it going to
>>> decrease its memory usage?
>>> 
>>> Brian
>>> 
>> 
>> I want to suspend the job for a time and if it can't restart I want to 
> stop 
>> it and let go back to the queue
>> 
>> If isn't possible I want to let go back to the queue directly
>> 
>> I attribute 2 CPU and 1 Go RAM for each user machine, job don't have to 
> take 
>> more than 1Go because it can be a problem for user.
>> 
>> Sorry for my bad English :s
>> 
>> Thank you and have a nice day
>> 
>> --
>> Romain
>> 
>> 
> 
> To more explain my problem:
> With htop I see that the cgroup limit is respect (for example a job can use 
> 500MB max).
> The "RES" column show the limit respect, but the virtual memory grow up and 
> the "progress bar" (which show all memory use on the machine) grow up too
> so my limit is at 500MB but the job use more than 1.3GB with no problem so 
> that can crash the machine
> 

Hi Romain,

I think I understand now.  Is it possible that the jobs are going into swap?

Options are:
1) Remove swap, or use the swappiness file in the /condor cgroup to remove condor's ability to use swap.
2) Set the max swap / memory usage for all of condor in the cgroup configuration.

Brian

> I just want to put back to the queue jobs which reach the limit.
> 
> What I need is to find the parameter and the arguments to put on to 
> configure condor to do this
> 
> The priority is to save the user even if the job restart from the beginning 
> 
> 
> Thank you
> 
> --
> Romain
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/