[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] memory "sharing" question



On 3/21/06, Kewley, J (John) <j.kewley@xxxxxxxx> wrote:
> > I have dual CPU computers, with 2 Gb RAM. So the Ad says that
> > each CPU has 1 Gb : Does that mean that if a process need
> > more memory than 1 Gb, it won't get it ? or will it get this
> > "over"-memory ?
>
> If the job requirements (some of which are automatically produced)
> say that it needs > 1Gb, I don't think it'll get matched.

This is correct - note that one nasty result of this is that, if you
do not define any memory requirements on your job
(i.e.  requirements = <blah> has not reference to Memory)
then when a job is running the memory value in the classad will
steadily be updated (every 15 mins or so by default) and after a
while, may go beyond what the machine advertises as available. This
will have no impact at all on the job (assuming that no other job is
running to take up memory and that the job doesn't start paging as it
runs out of physical memory).

If the job is then evicted it's requirements expression will be
rewritten to indicate that it will only run on a machine with
sufficient advertised memory for its value on the last run[1] - and
this will mean it won't run on the machine it was happily running on
before since it was advertising only half its real memory.

One solution to this is for all your jobs to ensure that they state in
the requirements a particular fixed Memory value - this then won't get
dynamically altered.

[1] not sure if it is the sample peak or the last sample - in most
cases I see this is the same

> > How would it be possible to declare that a CPU has 2 Gb ?
> > Because, on the opposite, I have processes which don't use
> > any memory (or really little), so, in the best world, I could
> > run 2 processes on 2 cpus, one with almost 2 GB Ram, and one
> > with almost nothing. But, I don't want to book the CPU for
> > only these kinds of jobs, sometimes I don't have any, so the
> > CPU must be free for other purposes...

to declare a machine as having 2GB just set the memory value in the
config directly (and reconfig)

> * As now - pretend each proc has a max of 1Gb each.
> * (for if you have predominantly large jobs), pretend each node has 1 proc only with 2Gb
> * Pretend you have> 1Gb on each node by setting MEMORY yourself
> * Set one to 1.5Gb and the other to 0.5Gb, so smaller jobs go to one proc and larger to the
>  other

All these work depending on the target jobs for this part of your
pool. I suggest using the one of these which best matches your load.
Ensuring all machines have sufficient memory that 2 processes can
happily have their full address space in physical memory is one option
if you have 32bit processes and 64bit OS (not sure if this plays
nicely on *nix - it works a treat on windows 2003 64bit server with
8GB of RAM :)

I found on my pool that segmenting the VM's such that the large jobs
predominantly ran on VM 1 and the smaller ones ran on VM2 achieved
*relatively* nice throughput with little impact to either job.
As the demands of jobs grow you can be sure that this will no longer
be the case in a years time though (indeed the original machines in
the pool now lost about 20-30% throughput on the smaller jobs due to
the larger ones taking up rather more than their fair share.
I have no simple solution to this other than the expensive one - buy
bigger boxes

> There may be other ways, this is best I can do for now.

There are complex and somewhat fragile solutions with the use of
multiple different VM's[2] and only using a few at a time. I would
suggest that That Way Lies Madness (tm)

Matt

[2] in the current condor sense which just means a reported segment of
the resources with no runtime control of this - nor indeed any
requirement to agree with the reality of what is actually present