[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTC and HPC



Thanks for your explanation, it's very helpfully

Thanks

On 12/05/2014 05:28 PM, Gary Jackson wrote:

"HPC" is not magic pixie dust. If you don't actually have tightly-coupled parallel applications that require high performance computing resources, then installing scheduling software that supports "HPC" isn't going to do anything useful for you. Unless you have those specific needs, HTCondor is going to do a lot more for you than those other schedulers.

As I understand it, the reason you'd use a purpose-built HPC batch scheduler is because HTCondor's scheduling algorithm isn't as flexible for parallel jobs. SLURM and Torque give the administrator a lot of tools for tuning parallel scheduling performance to maximize utilization or minimize turnaround time. For instance, they support plugging in a backfill scheduler for running jobs out of priority order when a lower priority job won't interfere with a higher priority job. Backfill in HTCondor, though still useful, doesn't work the same way and isn't useful for tightly-coupled parallel jobs.

Obviously, HTCondor is capable of scheduling and running parallel jobs, and you can use that if your parallel scheduling needs do not exceed what HTCondor can provide. HTCondor can start an OpenMPI job just as easily as SLURM.

On the other hand, you probably wouldn't use SLURM or Torque for the same sort of high throughput computing you do with HTCondor. HTCondor is a very sophisticated program that covers a lot more use cases than those two batch schedulers. For example, HTCondor has support for:

* transparent checkpointing
* running jobs on desktop machines with low impact on end users
* using cloud resources

There's no reason you can't use both HTCondor and a purpose-built parallel batch scheduler at the same time. Locally, we've used both Torque and HTCondor on our HPC clusters for many years. When Torque jobs run, they preempt any HTCondor jobs and the nodes leave the pool for the duration of the parallel job. It's worked out well.

On 12/4/14, 6:10 AM, marrodriguez wrote:
Hi

Hi
I have interest to implement condor on my site, but I have a doubt, Why
condor is consider a HTC  batch system and not HPC. it have some
disadvantage on HPC field respect Slurm, PBS vs SGE?

Thanks in advanced
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--

--

ALBA Synchrotron

Marc Rodriguez
Systems - Computing Division
 
ALBA SYNCHROTRON LIGHT SOURCE
Ctra. BP 1413 km. 3,3 | 08290 | Cerdanyola del Vallès| Barcelona | Spain
(+34) 93 592 40 81
www.albasynchrotron.es | marc.rodriguez@xxxxxxxx
 
Please, do not print this e-mail unless it is absolutely necessary.
Si heu rebut aquest correu per error, us informo que pot contenir informació confidencial i privada i que està prohibit el seu ús. Us agrairíem que ho comuniqueu al remitent i l'elimineu. Gràcies.
Si ha recibido este correo por error, le informo de que puede contener información confidencial y privada y que está prohibido su uso. Le agradeceré que lo comunique a su remitente y lo elimine. Gracias.
If you have received this e-mail in error, please note that it may contain confidential and private information, therefore, the use of this information is strictly forbidden. Please inform the sender of the error and delete the information received. Thank you.