[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Using Condor as a substitute for OpenPBS



On Oct 27, 2005, at 5:25 AM, Angel de Vicente wrote:

Marko Kääramees writes:

despite being a fan of Condor (and administering a 200 machines pool), I would
not recommend it as a substitute of OpenPBS for a cluster. I think you would be
better off trying Torque

Could you explain a little please why do you prefer Torque as a cluster
manager if there is anything more than originating from OpenPBS.

just a few things from when I had to take the decision on the scheduler for our
cluster:


* Condor (at least the stable version we are using) can only work with MPICH
1.2.4, and we need 1.2.6, MPICH2, LAM/MPI.

This is something we're very close to fixing.

* Our nodes are SMP machines, so when running a program on 16 CPUS it is not the
same to use 16 nodes (1 cpu in each node), 4 nodes (4 cpus in each node), or
other combinations. I looked at ways in which this could be accomplished with
Condor, but as matching expressions didn't work with regular expressions, I
didn't find a way to get this working.

By default, Condor will advertise each cpu as an individual resource. An MPI job will get 16 cpus, not 16 nodes.


* Condor is great at CPU harvesting, but in a dedicated cluster it is probably a
bit too heavyweight: it monitors things like keyboard activity, mouse
activity, etc. which are not important in our cluster, as the nodes are
dedicated and not accessible to users, unless they submit a job through the
queueing system.

Monitoring of keyboad, mouse, and load shouldn't produce any noticeable load on the machine.


* In combination with Maui, Torque offers a very fine control on usage
policies. For instance, in our cluster we have settings that guarantee a group
of users 40% of CPU time, and makes sure that no job that needs the whole
cluster for a period of more than 7 days will be accepted. There are probably
ways in which you can accomplish something similar with Condor, but the
emphasis of the system is not there, so you would have to go a little bit off
your way to get this working.


Actually if these points could be solved with a newer version of Condor, I would
gladly swap schedulers, so that I could incorporate the cluster to our Condor
pool, but for the time being I think Torque/Maui is a better option.

+----------------------------------+---------------------------------+ | Jaime Frey | Public Split on Whether | | jfrey@xxxxxxxxxxx | Bush Is a Divider | | http://www.cs.wisc.edu/~jfrey/ | -- CNN Scrolling Banner | +----------------------------------+---------------------------------+