[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor - condor_schedd daemon per pool or what?



Sassy,

I am not expert, but one main advantage to having multiple schedds is that a single machine can submit only so many jobs. This varies btw OS/platform of course and at least on Windows it is related to the desktop heap size (this can be increased, but there is a limit). I am not as familiar with Mac/Linux/Unix, but for windows you can only have so many people logged on to a machine. Therefore, if you want to submit by remotely logging on to a machine you may run into limitations here as well. The other thing to think about is that multiple schedds also allows you to set up high availability (fail over). If one of the schedds go down the queue can be transferred to a different schedd (this has to be set up because Condor does not default to this set up).

I have set up our pool so that our primary users have their own submit machine and then we also have two VMs that act as remote schedds. This seems to work well for us and reduces the number of schedds, but also allows for high availability and it allows us to increase the number of jobs that can be submitted to the pool. One thing that I have been trying to work out is how to minimize problems when submitting to a remote schedd. We have developed GUIs for a couple applications and the user can select a remote machine to submit jobs to. The problem is what if someone already submitted 500 jobs--how can this be detected?

There are likely some other reasons, that the condor community can elaborate on but these are the main reasons that I can think of. If you pool is only 4 machines (as it is in your example), I would not use two schedds but if the users want to submit jobs from their own machine then you may want to do this.

I have not used the SCHEDD_NAME, so I cannot address this.

I hope this helps to some degree.
mike





From: Sassy Natan <sassyn@xxxxxxxxx>
To: Condor-Users Mail List <condor-users@xxxxxxxxxxx>
Date: 07/06/2011 08:17 AM
Subject: [Condor-users] Condor - condor_schedd daemon per pool or what?
Sent by: condor-users-bounces@xxxxxxxxxxx





Hi All,

I need to know I get it right:
In the examples and document of Condor, they basic condor pool will be
based as the following:
1. A Machine known as the Central Manager: Running the
master,collector and negotiator daemons. - host name: Master
2. An Execute Machine: Running the master and the startd daemons. -
host name: Execute01 and  Execute02
3. A Regular Machine: Running the master, startd and schedd daemons: -
host name: Regular01 and Regular02

My question start here: If I I have two Regular Machines, what is the
use of having two different queues per one pool?.
To make myself more clear: we have two schedd daemons: one in
Regular01 and one in Regular02. And based on the quote "Each machine
running
condor_schedd maintains its own independent queue" we have two queue the pool.

So when users are login to Regular01 while other to Regular02,
submitting there jobs to the condor pool, I don't understand how can I
control my queue? I don't want to manage two differences queue, but a
global one.

If I take out the schedd from one of the Regular machine, say
Regular01, I can't commit jobs to the pool.
Well, this is almost true, since I can submit with a remote job using
the -n switch, but then I don't get what is the use
of having two schedd daemons running on two different machines in the
same pool (Unless off course you want to
have some load balancing  for the schedd daemons, but then again the
ll point of having a load balancing schedd is for save the status of
the co-existing queue).

And what If I don't want to specify the -n switch to each relevant
condor commands (like condor_q, condor_rm etc...).
And besides from a user perspective, not Admin one, I don't have any
clue what it the schedd hostname.

I saw there is an option for configuring the SCHEDD_NAME and
SCHEDD_ADDRESS_FILE options.
But I'm not sure I got it right. When point the name and the file to
my schedd (which is based on my example
in Regular02) I still get error and must point manually to the
Regular02 host name. (And I did put @ at the end of the SCHEDD_NAME.

If someone can help it will be great :)
Thanks
Sassy
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/