[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor - condor_schedd daemon per pool or what?



Dear Ian, First thank for the answer :-)

Regarding your question, "Can I ask: what is about multiple queues
that makes "management" hard? What exactly are you trying to manage?"
Well, I do understand the fact that having two condor_schedd machines
improve system scalability and resilience of the system. This is quite
obvious.
I just don't understand from a system administrator point of view how
do I make sure that job will run based on there priority and rank.

For example: If I send to one of the condor_schedd machine 100 jobs
(Will call it queue A) and 200 jobs to the second condor_schedd
machine (queue B), which job will run first on the executes machines
(startd)? jobs form queue A or B?
I understand that RANK are based on the machine configuration, so If a
job from queue B get a higher RANK it can take over jobs from queue A.
But then isn't it better to have a global queue, where u can manage
priority and other related configuration?

What do I miss here?

Thanks
Sassy



> Can I ask: what is about multiple queues that makes "management" hard? What
> exactly are you trying to manage?


On Thu, Jul 7, 2011 at 11:14 PM, Ian Chesal <ichesal@xxxxxxxxxxxxxxxxxx> wrote:
> On Wednesday, July 6, 2011 at 10:01 AM, Sassy Natan wrote:
>
> My question start here: If I I have two Regular Machines, what is the
> use of having two different queues per one pool?.
>
> Michael O'Donnell answered this one already, but I'll repeat it:
> scalability. One scheduler can only handle so much load in the form of
> queued and running jobs before it starts to fall down, unable to fill all
> the available slots in your pool with jobs. Where that limit is depends on
> the job startup rate and the OS you happen to be running on (traditionally
> Linux-based condor_schedd machines scaled much larger than Windows-based
> schedulers, but that gap has been closing and is really much closer in the
> 7.6.x series).
> There might also be administrative reasons to separate jobs on to multiple
> schedulers. For example: you may wish to enforce scheduling and matchmaking
> policies on some class of jobs via configuration file options.
> You may also wish to take advantage of scheduler technologies like dedicated
> schedulers to make MPI-type jobs in your pool easier to run.
> See: http://www.cs.wisc.edu/condor/manual/v7.6/3_13Setting_Up.html#sec:Configure-Dedicated-Resource
> for more details on this.
>
> So when users are login to Regular01 while other to Regular02,
> submitting there jobs to the condor pool, I don't understand how can I
> control my queue? I don't want to manage two differences queue, but a
> global one.
>
> This is only possible to some extent with Condor. Even with a really big
> machine for a single scheduler, at some point your execute slots and queued
> jobs will exceed the performance capabilities of a single-scheduler
> approach. Where that limit is depends on your jobs, your hardware and your
> execute node count.
> Distributed queues are one of the things that make Condor robust and highly
> scalable. You don't have a single queue point of failure or bottleneck.
> Can I ask: what is about multiple queues that makes "management" hard? What
> exactly are you trying to manage?

>
> If I take out the schedd from one of the Regular machine, say
> Regular01, I can't commit jobs to the pool.
> Well, this is almost true, since I can submit with a remote job using
> the -n switch, but then I don't get what is the use
> of having two schedd daemons running on two different machines in the
> same pool (Unless off course you want to
> have some load balancing for the schedd daemons, but then again the
> ll point of having a load balancing schedd is for save the status of
> the co-existing queue).
>
> You have a few options here:
> 1. you can have users use -n to do remote submissions;
> 2. you can give all users a log in to your one scheduler machine;
> 3. you can look into Condor's SOAP interface and write a custom submission
> tool that uses SOAP
> 4. you can use a meta-scheduler that acts as the single queue for all your
> users that then load balances these jobs to Condor schedulers on their
> behalf (CycleServer and MRG are examples)
>
> I saw there is an option for configuring the SCHEDD_NAME and
> SCHEDD_ADDRESS_FILE options.
> But I'm not sure I got it right. When point the name and the file to
> my schedd (which is based on my example
> in Regular02) I still get error and must point manually to the
> Regular02 host name. (And I did put @ at the end of the SCHEDD_NAME.
>
> I can't think of a good reason to mess with SCHEDD_NAME and
> SCHEDD_ADDRESS_FILE in your case.
> You may want to look at SCHEDD_HOST -- it lets you name a scheduler to
> contact when you run commands that contact the scheduler like 'condor_q' or
> 'condor_submit' and you don't supply the -name option to these commands to
> name a scheduler. It defaults to the local machine, but you may want to set
> it to some other value if you're not running a scheduler on the local
> machine. If your scheduler was on myhost1.mydomain you could set
> SCHEDD_NAME="mhost1.mydomain" on every other machine in your pool and then
> condor_q/rm/submit would work without having to use the -name option.
> See: http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#15346
> Regards,
> - Ian
> ---
> Ian Chesal
> Cycle Computing, LLC
> Leader in Open Compute Solutions for Clouds, Servers, and Desktops
> Enterprise Condor Support and Management Tools
> http://www.cyclecomputing.com
> http://www.cyclecloud.com
> http://twitter.com/cyclecomputing
>
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
>