[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Can VM borrow RAM from other VMs in the same computer?



On Thu, Feb 14, 2008 at 09:57:56AM +0100, Juli C?spedes i Capdevila wrote:
> I'm facing a situation where some of my users are trying to run huge
> programs on a single VM, and after careful examination I've realised
> that those programs require huge amounts of memory (well, in the range
> of 5-7 GB).
> 
> One of them is submitting jobs only to the biggest computers we have,
> each with two dual core processors, that Condor offers as 4 Vitrual
> Machines: vm1@xxxxxxx vm4@jane. Since the computer has 8GB of RAM, each
> VM is assigned about 2GB.

As long as you don't play tricks, like explicitly defining the shares of
each slot, resources are evenly split among all slots.

> When more than one of these big jobs is run on a single physical machine
> they never complete, and after a few hours jane is paginating and no
> computation is done. After 4 days we have to abort the job.

Not only that: if a job has to be evicted for some reason, and has gained 
a larger footprint than what a single slot advertizes, it will never 
restart (because there's no matching slot machine classad anymore).

> But if just one of the VMs jane is busy performing the jobs, they
> complete after one day or less.

You may modify your START expression, closing the remaining slots if there's
a large job running. Note that this will not fix the footprint issue.

> I've been unable to find that info on the documentation, so my question is:
> Do VMs on a single node have a reserved amount of memory (as shown on
> condor_status output), or do they ask for memory as needed and receive
> it if available?

It is possible to enforce memory usage limits (which is not what you're 
looking for!); in any other regard, the jobs are on their own once they
are started (they _are_ being watched over by their shadow/starter though),
and can compete for Unix resources like any other process...

IMHO, with multi-core machines becoming more and more common, Condor needs
more flexible classads (but I've heard rumours that such features are being
worked on and may show up very soon).
There was a discussion about similar things before, and at least one user
came up with "duplicated" resources (resulting in e.g. two slots, sharing 
the real resources, and another slot offering the full set of resources as
well. START expression used to "close down" conflicting slots, depending
on the already taken resources).

Cheers
 Steffen

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html