[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor - condor_schedd daemon per pool or what?



Sassy,

I believe Condor determines which jobs to run based on user priority. The person who submits the job and who has a lower user priority has more resources available to them.
You can use the following to evaluate the priorities given to each user:
condor_prio -all -allusers

You can also change the user priorities using:
condor_userprio -setprio


So, if for example a single user submitted two sets of jobs from two different submit machines the jobs would execute based on which jobs are matched first. If two different users submitted jobs from two different machines (or a single machine) then the jobs are influenced by the user priority.

mike






From: Sassy Natan <sassyn@xxxxxxxxx>
To: Condor-Users Mail List <condor-users@xxxxxxxxxxx>, ichesal@xxxxxxxxxxxxxxxxxx
Date: 07/12/2011 04:45 AM
Subject: Re: [Condor-users] Condor - condor_schedd daemon per pool or what?
Sent by: condor-users-bounces@xxxxxxxxxxx





Dear Ian, First thank for the answer :-)

Regarding your question, "Can I ask: what is about multiple queues
that makes "management" hard? What exactly are you trying to manage?"
Well, I do understand the fact that having two condor_schedd machines
improve system scalability and resilience of the system. This is quite
obvious.
I just don't understand from a system administrator point of view how
do I make sure that job will run based on there priority and rank.

For example: If I send to one of the condor_schedd machine 100 jobs
(Will call it queue A) and 200 jobs to the second condor_schedd
machine (queue B), which job will run first on the executes machines
(startd)? jobs form queue A or B?
I understand that RANK are based on the machine configuration, so If a
job from queue B get a higher RANK it can take over jobs from queue A.
But then isn't it better to have a global queue, where u can manage
priority and other related configuration?

What do I miss here?

Thanks
Sassy



> Can I ask: what is about multiple queues that makes "management" hard? What
> exactly are you trying to manage?


On Thu, Jul 7, 2011 at 11:14 PM, Ian Chesal <ichesal@xxxxxxxxxxxxxxxxxx> wrote:
> On Wednesday, July 6, 2011 at 10:01 AM, Sassy Natan wrote:
>
> My question start here: If I I have two Regular Machines, what is the
> use of having two different queues per one pool?.
>
> Michael O'Donnell answered this one already, but I'll repeat it:
> scalability. One scheduler can only handle so much load in the form of
> queued and running jobs before it starts to fall down, unable to fill all
> the available slots in your pool with jobs. Where that limit is depends on
> the job startup rate and the OS you happen to be running on (traditionally
> Linux-based condor_schedd machines scaled much larger than Windows-based
> schedulers, but that gap has been closing and is really much closer in the
> 7.6.x series).
> There might also be administrative reasons to separate jobs on to multiple
> schedulers. For example: you may wish to enforce scheduling and matchmaking
> policies on some class of jobs via configuration file options.
> You may also wish to take advantage of scheduler technologies like dedicated
> schedulers to make MPI-type jobs in your pool easier to run.
> See: 
http://www.cs.wisc.edu/condor/manual/v7.6/3_13Setting_Up.html#sec:Configure-Dedicated-Resource
> for more details on this.
>
> So when users are login to Regular01 while other to Regular02,
> submitting there jobs to the condor pool, I don't understand how can I
> control my queue? I don't want to manage two differences queue, but a
> global one.
>
> This is only possible to some extent with Condor. Even with a really big
> machine for a single scheduler, at some point your execute slots and queued
> jobs will exceed the performance capabilities of a single-scheduler
> approach. Where that limit is depends on your jobs, your hardware and your
> execute node count.
> Distributed queues are one of the things that make Condor robust and highly
> scalable. You don't have a single queue point of failure or bottleneck.
> Can I ask: what is about multiple queues that makes "management" hard? What
> exactly are you trying to manage?

>
> If I take out the schedd from one of the Regular machine, say
> Regular01, I can't commit jobs to the pool.
> Well, this is almost true, since I can submit with a remote job using
> the -n switch, but then I don't get what is the use
> of having two schedd daemons running on two different machines in the
> same pool (Unless off course you want to
> have some load balancing for the schedd daemons, but then again the
> ll point of having a load balancing schedd is for save the status of
> the co-existing queue).
>
> You have a few options here:
> 1. you can have users use -n to do remote submissions;
> 2. you can give all users a log in to your one scheduler machine;
> 3. you can look into Condor's SOAP interface and write a custom submission
> tool that uses SOAP
> 4. you can use a meta-scheduler that acts as the single queue for all your
> users that then load balances these jobs to Condor schedulers on their
> behalf (CycleServer and MRG are examples)
>
> I saw there is an option for configuring the SCHEDD_NAME and
> SCHEDD_ADDRESS_FILE options.
> But I'm not sure I got it right. When point the name and the file to
> my schedd (which is based on my example
> in Regular02) I still get error and must point manually to the
> Regular02 host name. (And I did put @ at the end of the SCHEDD_NAME.
>
> I can't think of a good reason to mess with SCHEDD_NAME and
> SCHEDD_ADDRESS_FILE in your case.
> You may want to look at SCHEDD_HOST -- it lets you name a scheduler to
> contact when you run commands that contact the scheduler like 'condor_q' or
> 'condor_submit' and you don't supply the -name option to these commands to
> name a scheduler. It defaults to the local machine, but you may want to set
> it to some other value if you're not running a scheduler on the local
> machine. If your scheduler was on myhost1.mydomain you could set
> SCHEDD_NAME="mhost1.mydomain" on every other machine in your pool and then
> condor_q/rm/submit would work without having to use the -name option.
> See: 
http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#15346
> Regards,
> - Ian
> ---
> Ian Chesal
> Cycle Computing, LLC
> Leader in Open Compute Solutions for Clouds, Servers, and Desktops
> Enterprise Condor Support and Management Tools
>
http://www.cyclecomputing.com
>
http://www.cyclecloud.com
>
http://twitter.com/cyclecomputing
>
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
>
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
>
https://lists.cs.wisc.edu/archive/condor-users/
>
>
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/