[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor - condor_schedd daemon per pool or what?



Exact description of the negotiation process can be found at: 
http://www.cs.wisc.edu/condor/manual/v7.6/3_4User_Priorities.html#SECTION00445000000000000000

I believe that your questions should be answered after reading this part of 
manual. 

Lukas

On Tue, Jul 12, 2011 at 07:17:22AM -0600, Michael O'Donnell wrote:
> Sassy,
> 
> I believe Condor determines which jobs to run based on user priority. The 
> person who submits the job and who has a lower user priority has more 
> resources available to them.
> You can use the following to evaluate the priorities given to each user:
> condor_prio -all -allusers
> 
> You can also change the user priorities using:
> condor_userprio -setprio
> 
> 
> So, if for example a single user submitted two sets of jobs from two 
> different submit machines the jobs would execute based on which jobs are 
> matched first. If two different users submitted jobs from two different 
> machines (or a single machine) then the jobs are influenced by the user 
> priority.
> 
> mike
> 
> 
> 
> 
> 
> 
> 
> From:
> Sassy Natan <sassyn@xxxxxxxxx>
> To:
> Condor-Users Mail List <condor-users@xxxxxxxxxxx>, 
> ichesal@xxxxxxxxxxxxxxxxxx
> Date:
> 07/12/2011 04:45 AM
> Subject:
> Re: [Condor-users] Condor - condor_schedd daemon per pool or what?
> Sent by:
> condor-users-bounces@xxxxxxxxxxx
> 
> 
> 
> Dear Ian, First thank for the answer :-)
> 
> Regarding your question, "Can I ask: what is about multiple queues
> that makes "management" hard? What exactly are you trying to manage?"
> Well, I do understand the fact that having two condor_schedd machines
> improve system scalability and resilience of the system. This is quite
> obvious.
> I just don't understand from a system administrator point of view how
> do I make sure that job will run based on there priority and rank.
> 
> For example: If I send to one of the condor_schedd machine 100 jobs
> (Will call it queue A) and 200 jobs to the second condor_schedd
> machine (queue B), which job will run first on the executes machines
> (startd)? jobs form queue A or B?
> I understand that RANK are based on the machine configuration, so If a
> job from queue B get a higher RANK it can take over jobs from queue A.
> But then isn't it better to have a global queue, where u can manage
> priority and other related configuration?
> 
> What do I miss here?
> 
> Thanks
> Sassy
> 
> 
> 
> > Can I ask: what is about multiple queues that makes "management" hard? 
> What
> > exactly are you trying to manage?
> 
> 
> On Thu, Jul 7, 2011 at 11:14 PM, Ian Chesal <ichesal@xxxxxxxxxxxxxxxxxx> 
> wrote:
> > On Wednesday, July 6, 2011 at 10:01 AM, Sassy Natan wrote:
> >
> > My question start here: If I I have two Regular Machines, what is the
> > use of having two different queues per one pool?.
> >
> > Michael O'Donnell answered this one already, but I'll repeat it:
> > scalability. One scheduler can only handle so much load in the form of
> > queued and running jobs before it starts to fall down, unable to fill 
> all
> > the available slots in your pool with jobs. Where that limit is depends 
> on
> > the job startup rate and the OS you happen to be running on 
> (traditionally
> > Linux-based condor_schedd machines scaled much larger than Windows-based
> > schedulers, but that gap has been closing and is really much closer in 
> the
> > 7.6.x series).
> > There might also be administrative reasons to separate jobs on to 
> multiple
> > schedulers. For example: you may wish to enforce scheduling and 
> matchmaking
> > policies on some class of jobs via configuration file options.
> > You may also wish to take advantage of scheduler technologies like 
> dedicated
> > schedulers to make MPI-type jobs in your pool easier to run.
> > See: 
> http://www.cs.wisc.edu/condor/manual/v7.6/3_13Setting_Up.html#sec:Configure-Dedicated-Resource
> 
> > for more details on this.
> >
> > So when users are login to Regular01 while other to Regular02,
> > submitting there jobs to the condor pool, I don't understand how can I
> > control my queue? I don't want to manage two differences queue, but a
> > global one.
> >
> > This is only possible to some extent with Condor. Even with a really big
> > machine for a single scheduler, at some point your execute slots and 
> queued
> > jobs will exceed the performance capabilities of a single-scheduler
> > approach. Where that limit is depends on your jobs, your hardware and 
> your
> > execute node count.
> > Distributed queues are one of the things that make Condor robust and 
> highly
> > scalable. You don't have a single queue point of failure or bottleneck.
> > Can I ask: what is about multiple queues that makes "management" hard? 
> What
> > exactly are you trying to manage?
> 
> >
> > If I take out the schedd from one of the Regular machine, say
> > Regular01, I can't commit jobs to the pool.
> > Well, this is almost true, since I can submit with a remote job using
> > the -n switch, but then I don't get what is the use
> > of having two schedd daemons running on two different machines in the
> > same pool (Unless off course you want to
> > have some load balancing for the schedd daemons, but then again the
> > ll point of having a load balancing schedd is for save the status of
> > the co-existing queue).
> >
> > You have a few options here:
> > 1. you can have users use -n to do remote submissions;
> > 2. you can give all users a log in to your one scheduler machine;
> > 3. you can look into Condor's SOAP interface and write a custom 
> submission
> > tool that uses SOAP
> > 4. you can use a meta-scheduler that acts as the single queue for all 
> your
> > users that then load balances these jobs to Condor schedulers on their
> > behalf (CycleServer and MRG are examples)
> >
> > I saw there is an option for configuring the SCHEDD_NAME and
> > SCHEDD_ADDRESS_FILE options.
> > But I'm not sure I got it right. When point the name and the file to
> > my schedd (which is based on my example
> > in Regular02) I still get error and must point manually to the
> > Regular02 host name. (And I did put @ at the end of the SCHEDD_NAME.
> >
> > I can't think of a good reason to mess with SCHEDD_NAME and
> > SCHEDD_ADDRESS_FILE in your case.
> > You may want to look at SCHEDD_HOST -- it lets you name a scheduler to
> > contact when you run commands that contact the scheduler like 'condor_q' 
> or
> > 'condor_submit' and you don't supply the -name option to these commands 
> to
> > name a scheduler. It defaults to the local machine, but you may want to 
> set
> > it to some other value if you're not running a scheduler on the local
> > machine. If your scheduler was on myhost1.mydomain you could set
> > SCHEDD_NAME="mhost1.mydomain" on every other machine in your pool and 
> then
> > condor_q/rm/submit would work without having to use the -name option.
> > See: 
> http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#15346
> > Regards,
> > - Ian
> > ---
> > Ian Chesal
> > Cycle Computing, LLC
> > Leader in Open Compute Solutions for Clouds, Servers, and Desktops
> > Enterprise Condor Support and Management Tools
> > http://www.cyclecomputing.com
> > http://www.cyclecloud.com
> > http://twitter.com/cyclecomputing
> >
> >
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with 
> a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/condor-users/
> >
> >