[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Limit Memory



Romain <nuelromain@...> writes:

> 
> Brian Bockelman <bbockelm <at> ...> writes:
> 
> > 
> > 
> > On Aug 27, 2013, at 9:22 AM, Romain <nuelromain <at> ...> wrote:
> > 
> > > Hi everybody,
> > > 
> > > I've some problems with the limits of memory usage on my pool.
> > > 
> > > So I've install cgroup and configure like that:
> > > BASE_CGROUP = htcondor
> > > CGROUP_MEMORY_LIMIT_POLICY = hard
> > > On my configuration files (condor_config)
> > > 
> > > I want to suspend the jobs if it stay at the limit for a time (1 min 
for 
> > > example) and go back to the queue if it stay another time more (5 min 
> for 
> > > example)
> > > 
> > 
> > I don't understand the question.  The memory limits are per-job.  If you 
> suspend the job, how is it going to
> > decrease its memory usage?
> > 
> > Brian
> > 
> 
> I want to suspend the job for a time and if it can't restart I want to 
stop 
> it and let go back to the queue
> 
> If isn't possible I want to let go back to the queue directly
> 
> I attribute 2 CPU and 1 Go RAM for each user machine, job don't have to 
take 
> more than 1Go because it can be a problem for user.
> 
> Sorry for my bad English :s
> 
> Thank you and have a nice day
> 
> --
> Romain
> 
> 

To more explain my problem:
With htop I see that the cgroup limit is respect (for example a job can use 
500MB max).
The "RES" column show the limit respect, but the virtual memory grow up and 
the "progress bar" (which show all memory use on the machine) grow up too
so my limit is at 500MB but the job use more than 1.3GB with no problem so 
that can crash the machine

I just want to put back to the queue jobs which reach the limit.

What I need is to find the parameter and the arguments to put on to 
configure condor to do this

The priority is to save the user even if the job restart from the beginning 


Thank you

--
Romain