[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Problems getting Condor Daemons to run on OS X



On Thu, 10 Mar 2005 22:44:27 +0000, Stuart Bowness <stuart@xxxxxxxxx> wrote:
> I edited to my config files as you suggested, and the service now runs.
> The master, collector, and negotiator are the only services running.
> However when I go to condor_status it just lists a blank line. Am I not
> supposed to see the central manager in the status list?
> 
> Also I was wondering if you could elaborate on why you do not recommend
> running either startd or schedd on the negotiation machine? Does this
> mean that this machine cannot participate in activity on the grid?

condor_status shows startd's - i.e. machines capable of running robs.
condor_q -global shows schedd's - i.e. machines capable of supplying jobs to run

There is nothing stopping you having schedd's and startd's on the same
machines (in many cases it is a good idea since you will likely only
alow your own jobs to run on that machine for testing etc.).
Again there is nothing stopping you from having schedd or startd on
the same machine as the central manager. However there are some good
reasons not to.

The negotiator/collector is special in that you need one (and high
availability aside) and only one of them and they don't supply their
own classad's (which are fundamentally what condor_status and condor_q
are telling you about).
The negotiator and collector are most often on the same machine for
convinience though there is no reason for that to be the case.

The reason for not running a startd on the same machine is that this
is very likely to consume a significant amount of CPU time. This is
time which, every so often on a negotiation cycle, will be required by
the negotiator to work out where to place jobs. IF the cycle takes too
long then you nd up getting a great many claim timeouts and wasted
time on your pool.

The reason for not running a schedd is similar though potentially less
of a big deal (since it is less likely that a submit machine will be
CPU loaded) but if you end up running a sizable queue from this
machine with a significant number of startd's then the file transfer
could well swamp your network interface causing issues.

Fundamentally the design of condor is to attempt to distribute across
multiple machines as much as possible to prevent choke points. The
negotiator/collector are required for some performance / central
control of resource allocation (you could design a similar system
without them but it would be much less responsive and harder to
control - though even more fault tolerant) so unless you fully
understand their load and resource requirements I would suggest
isolating them from the rest of the system till you do is the best
default behaviour...

There are some very good reasons / uses where having a schedd on the
same machine as the central is very important (a central submission
point via a webserver for example where you want to avoid the network
overhead of talking between negotiator and queue) This however is
something that should only be done by someone who has taken the time
to understand the performance considerations inherent in this approach
and benchmarked things to see that it is sufficient.

Matt