[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job Scheduling



Hello...

I just did what you asked me to do.
There is only one worker node is showing but in worker node it was not showing queue so I add "SCHEDD" in DAEMON_LIST in "condor_config.local"
So when I try
to submit my job from a master it is only showing in the queue of master node..Job is not showing in the worker node.
"STARTD" is still not added in DAEMON_LIST in master the master node.....
So can you tell me job is actually running on master node or worker node?
and who can I migrate my jobs to other workers node?

Greetings

> Date: Tue, 21 May 2013 13:18:36 +0100
> From: B.Candler@xxxxxxxxx
> To: htcondor-users@xxxxxxxxxxx
> Subject: Re: [HTCondor-users] Job Scheduling
>
> On Tue, May 21, 2013 at 11:47:32AM +0000, Muak rules wrote:
> > Hello
> > I'm going to explain all that I'd done.
> > I did configurations in /etc/condor/condor_config
> >
> > In client machine I did following configurations
> > CONDOR_HOST = pucitServer.CentOSWorld.com(name of a server machine)
> > ALLOW_WRITE = $(ALLOW_WRITE), $(CONDOR_HOST)
> > COLLECTOR_HOST = 10.0.0.1 (IP Address of server)
> > DAEMON_LIST = master,startd
>
> You are using a mix of names and IP addresses. Is
> pucitServer.CentOSWorld.com the machine with IP address 10.0.0.1? Do you
> have
>
> 10.0.0.1 pucitServer.CentOSWorld.com
>
> in your /etc/hosts file?
>
> I can describe a simple config where one job is the "master" (contains the
> job queue and is where you submit jobs) and others are "workers" (where the
> jobs actually execute).
>
> If pucitserver.centosworld.com is the 'master', then on a 'worker' machine I
> would make condor_local.config something like this:
>
> ---- 8< ----
> ## What machine is your central manager?
>
> CONDOR_HOST = pucitserver.centosworld.com
>
> ## Other global settings
>
> UID_DOMAIN = centosworld.com
> CONDOR_ADMIN = yourmail@xxxxxxxxxxxxxx
> MAIL = /usr/bin/mail
>
> ## Pool's short description
>
> COLLECTOR_NAME = My org condor pool
>
> ## When is this machine willing to start a job?
>
> #START = TRUE
> BackgroundLoad = 0.5
> START = $(CPUIdle) || (State != "Unclaimed" && State != "Owner")
>
> ## When to suspend a job?
>
> SUSPEND = FALSE
>
> ## When to nicely stop a job?
> ## (as opposed to killing it instantaneously)
>
> PREEMPT = FALSE
>
> ## When to instantaneously kill a preempting job
> ## (e.g. if a job is in the pre-empting stage for too long)
>
> KILL = FALSE
>
> ## This macro determines what daemons the condor_master will start and keep its watchful eyes on.
> ## The list is a comma or space separated list of subsystem names
>
> DAEMON_LIST = MASTER, STARTD
> ALLOW_WRITE = $(FULL_HOSTNAME), $(IP_ADDRESS), $(CONDOR_HOST)
>
> ## Optional: dynamic slots
>
> SLOT_TYPE_1 = cpus=100%, ram=75%, swap=100%, disk=100%
> SLOT_TYPE_1_PARTITIONABLE = True
> NUM_SLOTS_TYPE_1 = 1
> ---- 8< ----
>
> And on the 'master' node I would use the same file but change the bit from
> DAEMON_LIST onwards like this:
>
> DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, SCHEDD
> ALLOW_WRITE = $(FULL_HOSTNAME), $(IP_ADDRESS), $(CONDOR_HOST), 10.0.0.*
> # Optional if you are using dagman
> DAGMAN_MAX_SUBMITS_PER_INTERVAL = 200
> DAGMAN_SUBMIT_DELAY = 0
>
> condor_restart everywhere. Then login to the master node, check that
> "condor_status" shows the worker node(s), and then submit some jobs.
>
> If you want to make the master node run jobs as well, then I believe it
> should just be a question of adding STARTD to DAEMON_LIST.
>
> Regards,
>
> Brian.
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/