[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job Scheduling



On Tue, May 21, 2013 at 11:47:32AM +0000, Muak rules wrote:
>    Hello
>    I'm going to explain all that I'd done.
>    I did configurations in /etc/condor/condor_config
> 
>    In client machine I did following configurations
>    CONDOR_HOST = pucitServer.CentOSWorld.com(name of a server machine)
>    ALLOW_WRITE = $(ALLOW_WRITE), $(CONDOR_HOST)
>    COLLECTOR_HOST = 10.0.0.1 (IP Address of server)
>    DAEMON_LIST = master,startd

You are using a mix of names and IP addresses.  Is
pucitServer.CentOSWorld.com the machine with IP address 10.0.0.1?  Do you
have

10.0.0.1   pucitServer.CentOSWorld.com

in your /etc/hosts file?

I can describe a simple config where one job is the "master" (contains the
job queue and is where you submit jobs) and others are "workers" (where the
jobs actually execute).

If pucitserver.centosworld.com is the 'master', then on a 'worker' machine I
would make condor_local.config something like this:

---- 8< ----
##  What machine is your central manager?

CONDOR_HOST = pucitserver.centosworld.com

## Other global settings

UID_DOMAIN = centosworld.com
CONDOR_ADMIN = yourmail@xxxxxxxxxxxxxx
MAIL = /usr/bin/mail

## Pool's short description

COLLECTOR_NAME = My org condor pool

##  When is this machine willing to start a job? 

#START = TRUE
BackgroundLoad = 0.5
START = $(CPUIdle) || (State != "Unclaimed" && State != "Owner")

##  When to suspend a job?

SUSPEND = FALSE

##  When to nicely stop a job?
##  (as opposed to killing it instantaneously)

PREEMPT = FALSE

##  When to instantaneously kill a preempting job
##  (e.g. if a job is in the pre-empting stage for too long)

KILL = FALSE

##  This macro determines what daemons the condor_master will start and keep its watchful eyes on.
##  The list is a comma or space separated list of subsystem names

DAEMON_LIST = MASTER, STARTD
ALLOW_WRITE = $(FULL_HOSTNAME), $(IP_ADDRESS), $(CONDOR_HOST)

##  Optional: dynamic slots

SLOT_TYPE_1 = cpus=100%, ram=75%, swap=100%, disk=100%
SLOT_TYPE_1_PARTITIONABLE = True
NUM_SLOTS_TYPE_1 = 1
---- 8< ----

And on the 'master' node I would use the same file but change the bit from
DAEMON_LIST onwards like this:

DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, SCHEDD
ALLOW_WRITE = $(FULL_HOSTNAME), $(IP_ADDRESS), $(CONDOR_HOST), 10.0.0.*
# Optional if you are using dagman
DAGMAN_MAX_SUBMITS_PER_INTERVAL = 200
DAGMAN_SUBMIT_DELAY = 0

condor_restart everywhere. Then login to the master node, check that
"condor_status" shows the worker node(s), and then submit some jobs.

If you want to make the master node run jobs as well, then I believe it
should just be a question of adding STARTD to DAEMON_LIST.

Regards,

Brian.