[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Using Condor as a substitute for OpenPBS



Hi,

Marko Kääramees writes:
 > Hello,
 > 
 > > 
 > > despite being a fan of Condor (and administering a 200 machines pool), I would
 > > not recommend it as a substitute of OpenPBS for a cluster. I think you would be
 > > better off trying Torque
 > 
 > Could you explain a little please why do you prefer Torque as a cluster
 > manager if there is anything more than originating from OpenPBS.

just a few things from when I had to take the decision on the scheduler for our
cluster:

* Condor (at least the stable version we are using) can only work with MPICH
  1.2.4, and we need 1.2.6, MPICH2, LAM/MPI.

* Our nodes are SMP machines, so when running a program on 16 CPUS it is not the
  same to use 16 nodes (1 cpu in each node), 4 nodes (4 cpus in each node), or
  other combinations. I looked at ways in which this could be accomplished with
  Condor, but as matching expressions didn't work with regular expressions, I
  didn't find a way to get this working.

* Condor is great at CPU harvesting, but in a dedicated cluster it is probably a
  bit too heavyweight: it monitors things like keyboard activity, mouse
  activity, etc. which are not important in our cluster, as the nodes are
  dedicated and not accessible to users, unless they submit a job through the
  queueing system.

* In combination with Maui, Torque offers a very fine control on usage
  policies. For instance, in our cluster we have settings that guarantee a group
  of users 40% of CPU time, and makes sure that no job that needs the whole
  cluster for a period of more than 7 days will be accepted. There are probably
  ways in which you can accomplish something similar with Condor, but the
  emphasis of the system is not there, so you would have to go a little bit off
  your way to get this working.

Actually if these points could be solved with a newer version of Condor, I would
gladly swap schedulers, so that I could incorporate the cluster to our Condor
pool, but for the time being I think Torque/Maui is a better option.

Cheers,
Angel de Vicente
-- 
----------------------------------
http://www.iac.es/galeria/angelv/

PostDoc Software Support
Instituto de Astrofisica de Canarias