[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] MPI dedicated schedulers



Hi Nicolas...

Last time, I had this problem... but I have resolved it with the next configuration

You need to configure that:

In condor.config.local of your central manager (dedicated scheduler) write the next:

######################################################################
# DEDICATED SCHEDULER
######################################################################

######################################################################
######################################################################
##  Settings you MUST customize!
######################################################################
######################################################################

##  What is the name of the dedicated scheduler for this resource?
##  You MUST fill in the correct full hostname where you're running
##  the dedicated scheduler, and where users will submit their
##  dedicated jobs.  The "DedicateScheduler@" part should not be
##  changed, ONLY the hostname.
DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxx"
STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler

######################################################################
######################################################################
##  Settings you should leave alone, but that must be defined
######################################################################
######################################################################

##  Path to the special version of rsh that's required to spawn MPI
##  jobs under Condor.  WARNING: This is not a replacement for rsh,
##  and does NOT work for interactive use.  Do not use it directly!
MPI_CONDOR_RSH_PATH = $(LIBEXEC)

##  Path to OpenSSH server binary
##  Condor uses this to establish a private SSH connection between execute
##  machines. It is usually in /usr/sbin, but may be in /usr/local/sbin
CONDOR_SSHD = /usr/sbin/sshd

##  Path to OpenSSH keypair generator.
##  Condor uses this to establish a private SSH connection between execute
##  machines. It is usually in /usr/bin, but may be in /usr/local/bin
CONDOR_SSH_KEYGEN = /usr/bin/ssh-keygen

##  This setting puts the DedicatedScheduler attribute, defined above,
##  into your machine's classad.  This way, the dedicated scheduler
##  (and you) can identify which machines are configured as dedicated
##  resources.
STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler


And in the execute nodes (dedicated resources), write in the condor_config.local

######################################################################
# DEDICATED RESOURCE
######################################################################

######################################################################
######################################################################
##  Settings you MUST customize!
######################################################################
######################################################################

##  What is the name of the dedicated scheduler for this resource?
##  You MUST fill in the correct full hostname where you're running
##  the dedicated scheduler, and where users will submit their
##  dedicated jobs.  The "DedicateScheduler@" part should not be
##  changed, ONLY the hostname.
DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxx"
STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler

######################################################################
######################################################################
##  Settings you should leave alone, but that must be defined
######################################################################
######################################################################

##  Path to the special version of rsh that's required to spawn MPI
##  jobs under Condor.  WARNING: This is not a replacement for rsh,
##  and does NOT work for interactive use.  Do not use it directly!
MPI_CONDOR_RSH_PATH = $(LIBEXEC)

##  Path to OpenSSH server binary
##  Condor uses this to establish a private SSH connection between execute
##  machines. It is usually in /usr/sbin, but may be in /usr/local/sbin
CONDOR_SSHD = /usr/sbin/sshd

##  Path to OpenSSH keypair generator.
##  Condor uses this to establish a private SSH connection between execute
##  machines. It is usually in /usr/bin, but may be in /usr/local/bin
CONDOR_SSH_KEYGEN = /usr/bin/ssh-keygen

##  This setting puts the DedicatedScheduler attribute, defined above,
##  into your machine's classad.  This way, the dedicated scheduler
##  (and you) can identify which machines are configured as dedicated
##  resources.
STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler

##--------------------------------------------------------------------
## 2) Always run jobs, but prefer dedicated ones
##--------------------------------------------------------------------
START           = True
SUSPEND = False
CONTINUE        = True
PREEMPT = False
KILL            = False
WANT_SUSPEND    = False
WANT_VACATE     = False
RANK            = Scheduler =?= $(DedicatedScheduler)


Next you must restart the "master" daemon in all nodes, with this command : condor restart -master

Other thing, your daemon list of execute nodes must be :

DAEMON_LIST = MASTER, STARTD, *SCHEDD*


I hope this help...

PD: sorry for my english


Nicolas GUIOT escribió:
Hi all,

I'm trying to setup condor to submit MPI jobs. If I understood correctly, I need to first setup a dedicated scheduler.
I then checked the example "condor_config.local.dedicated.submit" file, but eveything is commented, so eventually I have "nothing" in this file (see attach.)
I found this page : (http://www.openems.org/display/CONDOR/Ask+Mike ---> Optena), which says I should add something like :
DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxxx" STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler

in the local condor_config file.

So, which of these solution is the right one ? a mix of both ? so why is the example file empty ?

For now, I didn't change anything on the dedicated scheduler config file, and added (and modified) the config file for one machine I wanted to use as dedicated resource :
#################################
# Start only as EXECUTE machine
DAEMON_LIST = MASTER, STARTD
##### Changes so that we don't care of KeyboardIdle
START                   = ( $(CPUIdle) || (State != "Unclaimed" && \
                                State !="Owner") )
WANT_SUSPEND            = ( $(SmallJob) || $(IsPVM) || $(IsVanilla) )

SUSPEND                 = ( (CpuBusyTime > 2 * $(MINUTE)) \
                                && $(ActivationTimer) > 90 )

CONTINUE                = ( $(CPUIdle) && ($(ActivityTimer) > 10) )

##  condor_config.local.dedicated.resource
DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxxxxx"
## 3) Always run dedicated jobs, but only allow non-dedicated jobs to
##    run on an opportunistic basis.
SUSPEND = Scheduler =!= $(DedicatedScheduler) && ($(SUSPEND))
PREEMPT = Scheduler =!= $(DedicatedScheduler) && ($(PREEMPT))
#RANK_FACTOR    = 1000000
RANK_FACTOR     = 100
RANK    = (Scheduler =?= $(DedicatedScheduler) * $(RANK_FACTOR)) + $(RANK)
START   = (Scheduler =?= $(DedicatedScheduler)) || ($(START))

MPI_CONDOR_RSH_PATH = $(LIBEXEC)
CONDOR_SSHD = /usr/sbin/sshd
CONDOR_SSH_KEYGEN = /usr/bin/ssh-keygen
STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler

#################

if I "ps ax|grep condor" this dedicated resource, I don't see any startd running (that I usually see on execute machines...) : $ ps ax|grep cond 24677 ? Ss 0:00 /nfs/opt/condor_x86_64/sbin/condor_master 24843 pts/0 S+ 0:00 grep cond

And this "dedicated resource" just disappeared from my "condor_status" list

Any idea to solve that ?
I'm using condor 6.8.3

Thanks for your help

Nicolas
----------------------------------------------------
CNRS - UPR 9080 : Laboratoire de Biochimie Theorique
Institut de Biologie Physico-Chimique
13 rue Pierre et Marie Curie
75005 PARIS - FRANCE

Tel : +33 158 41 51 70
Fax : +33 158 41 50 26
----------------------------------------------------
------------------------------------------------------------------------

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR


--
Ana Silva Gallego		
Sistemas Centro Informático Científico de Andalucía (CICA) Avda. Reina Mercedes s/n - 41012 - Sevilla (Spain) Tfno.: +34 955 056 600 / +34 955 056 632 / FAX: +34 955 056 650
Consejería de Innovación, Ciencia y Empresa
Junta de Andalucía

---------------------------------------------------
Este mensaje esta firmado digitalmente. Para poder
reconocer la firma desde su cliente debera tener
instalado el certificado raiz de la CA del CICA en
el mismo. Puede descargarlo desde:

http://pki.cica.es/cacert/
---------------------------------------------------

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature