[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Joining an OpenMosix cluster to a Condor Pool



Thanks for the info. I'm fairly new at using CONDOR and as of right
now it's in an experimental test mode while we think about ways we
want to use it.

I find it interesting how the different groups are approaching
distributed processing. It would be interesting to know if kernel
level distributed processing researchers communicate and/or
collaborate with effors at distributed systems like CONDOR or GLOBUS.
I would imagine that both groups at some point will be implementing
software that overlapps somewhat.



On 5/10/05, Mark Silberstein <marks@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
> I have it working flawlessly, for already 2-3 years.
> The idea is that you should make "Condor know about mosix, and mosix -
> know about Condor"
> The only thing you should do is to start condor_master daemon with
> runhome. Since MOSIX-related environment is inherited to all forked
> processes, all condor-originated jobs will not be moved around, so that
> eventually hosts having Condor jobs running will not be used by MOSIX.
> To cause Condor not to run things on machines, occupied by MOSIX, you
> have to edit STARTD policy not to run things if the load average is
> higher than, say, 0.6-0.7
> There's one subtlety, you should pay attention to.
> The only problem is when you parallelize your applications using MOSIX,
> i.e. start with one process, which forks many processes simultaneously,
> each one working on different part of the input. After all forked
> processes start on single node, MOSIX will usually move them to other
> nodes right away, since its algorithms will prefer less loaded nodes.
> This will not be true if your cluster is fully occupied by Condor jobs.
> MOSIX will NOT move the processes immediately, it will take a while
> until it realizes that it's still worth moving them to the busy nodes
> ( remember, all these are running Condor jobs at this time). This will
> cause an increase of the "owner load average", so Condor will evict the
> running Condor jobs. This also takes a bit of a time. So ONLY now your
> forked processes are running as you expected.
> If you have any more questions - you might want to take it off the list
> Mark
> On Mon, 2005-05-09 at 14:45 -0400, Scott B wrote:
> > Has anybody attempted to join an OpenMosix cluster of systems to a
> > CONDOR pool? I can think of several problems that might occur.
> >
> > Or would it be folly to even consider it?
> >
> > _______________________________________________
> > Condor-users mailing list
> > Condor-users@xxxxxxxxxxx
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
>