[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] MOSIX VS. CONDOR

The two systems have totally different purpose and are actually

MOSIX is very convenient in a cluster where you have interactive
CPU-intensive jobs. All your users may start their jobs on any machine
in a cluster, and MOSIX will keep the load, balanced among all the
machines. Since the job migration is transparent to users ( they just
invoke their executables normally, as if it was regular LINUX, and MOSIX
migrates their jobs to balance the load), MOSIX is very handy. It works
perfect, as long as the amount of jobs the users run simultaneously does
not exceed the overall capacity of the cluster machines. (For example,
if you have 2 machines A and B, with 1 CPU each, MOSIX will allow you to
invoke 2 CPU-intensive jobs at the same time from machine A, as if it
was 1 machine with 2 CPUs, although without access to  shared memory).
The backside of MOSIX is that due to some of its functional deficiencies
it may fail to migrate a job, without a user to even notice that. User
will continue to run more and more jobs, expecting from them to migrate,
but they will not do so, and this will create huge load on the same
node. However in general it works not bad at all. Another issue is
scalability, which is limited to ~100 of nodes. There is active research
on this, check it at www.mosix.org. And finally, MOSIX does not allow
you to set up any policy, regarding how many jobs each user is allowed
to run. 

In the cases when you have more jobs than total CPUs in the pool, you
have to create queuing mechanism, and here Condor becomes very useful.
That is, Condor will schedule jobs for executions as long as there are
free CPUs ( unless you change the defaults ). The backside of Condor is
that it's NOT transparent, i.e. the users have to create submit file,
and submit a BATCH job, as opposed to interactive job for MOSIX. They
will not get the results right away in most of the cases. 
However  it's exactly the right thing to do, if you have 2 CPUs and
10000 runs of the same simulation with different parameters. MOSIX will
choke by running them all together, and Condor will run them  one after
one. Also, Condor provides  checkpointing mechanism, which allows to
continue your computation from the last check point, even if it failed. 

So the punchline - Condor and MOSIX can and should be combined in
relatively small pools where you have different kinds of jobs -
interactive and batch. There are some tricks to implement this, but it
works great. 

Let me know if you need more details.
On Fri, 2004-11-05 at 01:26, Uma Krishnan wrote:
> What is the difference between Mosix and Condor? Thanks
> Uma
> ______________________________________________________________________
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> http://lists.cs.wisc.edu/mailman/listinfo/condor-users