[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Joining an OpenMosix cluster to a Condor Pool
- Date: Tue, 10 May 2005 11:37:55 +0300
- From: Mark Silberstein <marks@xxxxxxxxxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Joining an OpenMosix cluster to a Condor Pool
I have it working flawlessly, for already 2-3 years.
The idea is that you should make "Condor know about mosix, and mosix -
know about Condor"
The only thing you should do is to start condor_master daemon with
runhome. Since MOSIX-related environment is inherited to all forked
processes, all condor-originated jobs will not be moved around, so that
eventually hosts having Condor jobs running will not be used by MOSIX.
To cause Condor not to run things on machines, occupied by MOSIX, you
have to edit STARTD policy not to run things if the load average is
higher than, say, 0.6-0.7
There's one subtlety, you should pay attention to.
The only problem is when you parallelize your applications using MOSIX,
i.e. start with one process, which forks many processes simultaneously,
each one working on different part of the input. After all forked
processes start on single node, MOSIX will usually move them to other
nodes right away, since its algorithms will prefer less loaded nodes.
This will not be true if your cluster is fully occupied by Condor jobs.
MOSIX will NOT move the processes immediately, it will take a while
until it realizes that it's still worth moving them to the busy nodes
( remember, all these are running Condor jobs at this time). This will
cause an increase of the "owner load average", so Condor will evict the
running Condor jobs. This also takes a bit of a time. So ONLY now your
forked processes are running as you expected.
If you have any more questions - you might want to take it off the list
On Mon, 2005-05-09 at 14:45 -0400, Scott B wrote:
> Has anybody attempted to join an OpenMosix cluster of systems to a
> CONDOR pool? I can think of several problems that might occur.
> Or would it be folly to even consider it?
> Condor-users mailing list