[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] MPI dedicated schedulers



Hi all,

I'm trying to setup condor to submit MPI jobs. If I understood correctly, I need to first setup a dedicated scheduler.
I then checked the example "condor_config.local.dedicated.submit" file, but eveything is commented, so eventually I have "nothing" in this file (see attach.) 

I found this page : (http://www.openems.org/display/CONDOR/Ask+Mike ---> Optena), which says I should add something like :
DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxxx" 
STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler

in the local condor_config file.

So, which of these solution is the right one ? a mix of both ? so why is the example file empty ?

For now, I didn't change anything on the dedicated scheduler config file, and added (and modified) the config file for one machine I wanted to use as dedicated resource :
 
#################################
# Start only as EXECUTE machine
DAEMON_LIST = MASTER, STARTD
##### Changes so that we don't care of KeyboardIdle
START                   = ( $(CPUIdle) || (State != "Unclaimed" && \
                                State !="Owner") )
WANT_SUSPEND            = ( $(SmallJob) || $(IsPVM) || $(IsVanilla) )

SUSPEND                 = ( (CpuBusyTime > 2 * $(MINUTE)) \
                                && $(ActivationTimer) > 90 )

CONTINUE                = ( $(CPUIdle) && ($(ActivityTimer) > 10) )

##  condor_config.local.dedicated.resource
DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxxxxx"
## 3) Always run dedicated jobs, but only allow non-dedicated jobs to
##    run on an opportunistic basis.
SUSPEND = Scheduler =!= $(DedicatedScheduler) && ($(SUSPEND))
PREEMPT = Scheduler =!= $(DedicatedScheduler) && ($(PREEMPT))
#RANK_FACTOR    = 1000000
RANK_FACTOR     = 100
RANK    = (Scheduler =?= $(DedicatedScheduler) * $(RANK_FACTOR)) + $(RANK)
START   = (Scheduler =?= $(DedicatedScheduler)) || ($(START))

MPI_CONDOR_RSH_PATH = $(LIBEXEC)
CONDOR_SSHD = /usr/sbin/sshd
CONDOR_SSH_KEYGEN = /usr/bin/ssh-keygen
STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler

#################

if I "ps ax|grep condor" this dedicated resource, I don't see any startd running (that I usually see on execute machines...) : 
$ ps ax|grep cond
24677 ?		Ss 	0:00 /nfs/opt/condor_x86_64/sbin/condor_master 
24843 pts/0	S+	0:00 grep cond

And this "dedicated resource" just disappeared from my "condor_status" list

Any idea to solve that ?
I'm using condor 6.8.3

Thanks for your help

Nicolas
----------------------------------------------------
CNRS - UPR 9080 : Laboratoire de Biochimie Theorique
Institut de Biologie Physico-Chimique
13 rue Pierre et Marie Curie
75005 PARIS - FRANCE

Tel : +33 158 41 51 70
Fax : +33 158 41 50 26
----------------------------------------------------

Attachment: condor_config.local.dedicated.submit
Description: Binary data