[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Communication problem



On Thu, 2 Dec 2004 11:04:52 +0100, dautret <dautret@xxxxxxxx> wrote:
> Hi,
> We're using Condor to execute jobs which take a lot of time on 15
> macintosh G5.
> Our "vanilla" configuration:
> - Central manager: xserve G4 username=condor
> - Submit machine:  same xserve G4 with another username= submit
> - Execution machines: G5
> We have 2 condor_master on the same machine (to manage and to submit)
> with 2 different username. Can this configuration lead pbs ?

There is no reason to run 2 masters on the same machine unless you
have two pools or REALLY want your negotiator/collector to run as a
diffeent user to the submitter (I see no reason for this which
outweighs the hassle of two process trees).

There seems to be some confusion about 'master' daemon and the 'pool
master controller' the former is a daemon which basically does nothing
but start / stop / reconfigure the other daemons on the same machine.

the 'pool master controller' is what people often think when they hear
master. In actual fact the tasks of manging the job assignment /
manging info supplied from the various machine sin the pool is
performed by two other daemons, the negotiator and collector
respectively.

the submitter is the schedd (schedule daemon)

one master daemon is quite capable of starting all three of the
daemons you require, simply merge the two config files and alter the
daemon list

DAEMON_LIST = MASTER,  SCHEDD
DAEMON_LIST = MASTER, NEGOTIATOR, COLLECTOR

to be

DAEMON_LIST = MASTER,  NEGOTIATOR, COLLECTOR, SCHEDD

Note that this has some potential issues if you are using it as a
central submission point since the negotiation can end up taking a
long timeand you may find it interferes with the fast starting of
multiple jobs a once causing erroneous match timeouts (there is a
reason that condor is designed with all the seperate daemons despite
the annoyance this sometimes causes us users - to reduce the chance of
a central choke point slowing things down)

Are you really sure you have to keep one single submit point?

this may well be entirely irrelevant to your issues but you might find
removing one of the condor trees helps...

Matt