[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Can Condor Solve this Scheduling problem ?

Hi Condor Users,


I have a curious scheduling problem to solve and I hope an experienced user can help me out here.

Thanks in Advance!




My compute infrastructure is made of many Racks. Each rack has a set of blades dedicated for computation.

Currently, each rack is dedicated to a particular compute task.




All compute tasks (on all racks) are MPI jobs.

Not all racks are busy all the time and hence I am looking at a scheduler to scale my MPI jobs effectively.

There is one more complication here.

My MPI jobs are iterative. Each iteration runs not more than 2 seconds.





I just read Condor manual and have briefly experimented with it.

My main problem is latency. If Condor takes even 1 second to reserve machines for my job, I have already lost my game.

Can someone throw some light on what is the expected latency for reserving machines (assume 1 Gbps Ethernet network reserving for 20 machines) ?



If I model my individual MPI runs as DAGMan Jobs (to enforce dependency), then each Job will create a separate Cluster anyway and I will be in latency trouble.

Alternatively, I could actually create a “Job Cluster” – which will execute my iterative MPI runs in a single cluster of nodes.

This way I can hide my cluster creation latency.

However, I don’t think I can dictate the dependencies within a Job Cluster using “DAGMan”.

DAGMan only works at “Job Cluster” level. I don’t think it works at Process level inside a Cluster.

How do I use Condor to solve this problem ?



I understand Condor provides Webservice APIs.

Is there a way to request Condor to reserve some machines (based on a ClassAd) and block them for MPI runs and release them later on?



If Condor cannot solve this, Can you give a recommendation for an alternate scheduler?



Say, If I end up writing my own custom scheduler for this, Can anyone point out relevant technologies that will be useful?




Best Regards,



The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted,
lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents
(with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates.
Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the
views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification,
distribution and / or publication of this message without the prior written consent of authorized representative of
HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately.
Before opening any email and/or attachments, please check them for viruses and other defects.