[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] MPI dedicated schedulers --> to condor admins



Hi Ana,

Thanks for your help, but after googling a bit, I found the solution (at least for one of my problems...)

It has been refered in this mail long time ago :
https://lists.cs.wisc.edu/archive/condor-users/pre-2004-June/msg01195.shtml

in my condor_config.local.dedicated.resource, for "Option 3", I need to add : 
RANK = 0 
just before :
RANK    = (Scheduler =?= $(DedicatedScheduler) * $(RANK_FACTOR)) + $(RANK)

Though it seems to be an old error, it is still present in the newer versions of example files : Maybe condor admin could correct this in next releases ? Or am I totally wrong ?

Nicolas

----------------
On Thu, 01 Feb 2007 12:22:03 +0100
Ana Silva <asilva@xxxxxxx> wrote:

> Hi Nicolas...
> 
> Last time, I had this problem... but I have resolved it with the next 
> configuration
> 
> You need to configure that:
> 
> In condor.config.local of your central manager (dedicated scheduler) 
> write the next:
> 
> ######################################################################
> # DEDICATED SCHEDULER
> ######################################################################
> 
> ######################################################################
> ######################################################################
> ##  Settings you MUST customize!
> ######################################################################
> ######################################################################
> 
> ##  What is the name of the dedicated scheduler for this resource?
> ##  You MUST fill in the correct full hostname where you're running
> ##  the dedicated scheduler, and where users will submit their
> ##  dedicated jobs.  The "DedicateScheduler@" part should not be
> ##  changed, ONLY the hostname.
> DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxx"
> STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler
> 
> ######################################################################
> ######################################################################
> ##  Settings you should leave alone, but that must be defined
> ######################################################################
> ######################################################################
> 
> ##  Path to the special version of rsh that's required to spawn MPI
> ##  jobs under Condor.  WARNING: This is not a replacement for rsh,
> ##  and does NOT work for interactive use.  Do not use it directly!
> MPI_CONDOR_RSH_PATH = $(LIBEXEC)
> 
> ##  Path to OpenSSH server binary
> ##  Condor uses this to establish a private SSH connection between execute
> ##  machines. It is usually in /usr/sbin, but may be in /usr/local/sbin
> CONDOR_SSHD = /usr/sbin/sshd
> 
> ##  Path to OpenSSH keypair generator.
> ##  Condor uses this to establish a private SSH connection between execute
> ##  machines. It is usually in /usr/bin, but may be in /usr/local/bin
> CONDOR_SSH_KEYGEN = /usr/bin/ssh-keygen
> 
> ##  This setting puts the DedicatedScheduler attribute, defined above,
> ##  into your machine's classad.  This way, the dedicated scheduler
> ##  (and you) can identify which machines are configured as dedicated
> ##  resources.
> STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler
> 
> 
> And in the execute nodes (dedicated resources), write in the 
> condor_config.local
> 
> ######################################################################
> # DEDICATED RESOURCE
> ######################################################################
> 
> ######################################################################
> ######################################################################
> ##  Settings you MUST customize!
> ######################################################################
> ######################################################################
> 
> ##  What is the name of the dedicated scheduler for this resource?
> ##  You MUST fill in the correct full hostname where you're running
> ##  the dedicated scheduler, and where users will submit their
> ##  dedicated jobs.  The "DedicateScheduler@" part should not be
> ##  changed, ONLY the hostname.
> DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxx"
> STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler
> 
> ######################################################################
> ######################################################################
> ##  Settings you should leave alone, but that must be defined
> ######################################################################
> ######################################################################
> 
> ##  Path to the special version of rsh that's required to spawn MPI
> ##  jobs under Condor.  WARNING: This is not a replacement for rsh,
> ##  and does NOT work for interactive use.  Do not use it directly!
> MPI_CONDOR_RSH_PATH = $(LIBEXEC)
> 
> ##  Path to OpenSSH server binary
> ##  Condor uses this to establish a private SSH connection between execute
> ##  machines. It is usually in /usr/sbin, but may be in /usr/local/sbin
> CONDOR_SSHD = /usr/sbin/sshd
> 
> ##  Path to OpenSSH keypair generator.
> ##  Condor uses this to establish a private SSH connection between execute
> ##  machines. It is usually in /usr/bin, but may be in /usr/local/bin
> CONDOR_SSH_KEYGEN = /usr/bin/ssh-keygen
> 
> ##  This setting puts the DedicatedScheduler attribute, defined above,
> ##  into your machine's classad.  This way, the dedicated scheduler
> ##  (and you) can identify which machines are configured as dedicated
> ##  resources.
> STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler
> 
> ##--------------------------------------------------------------------
> ## 2) Always run jobs, but prefer dedicated ones
> ##--------------------------------------------------------------------
> START           = True
> SUSPEND = False
> CONTINUE        = True
> PREEMPT = False
> KILL            = False
> WANT_SUSPEND    = False
> WANT_VACATE     = False
> RANK            = Scheduler =?= $(DedicatedScheduler)
> 
> 
> Next you must restart the "master" daemon in all nodes, with this 
> command : condor restart -master
> 
> Other thing, your daemon list of execute nodes must be :
> 
> DAEMON_LIST = MASTER, STARTD, *SCHEDD*
> 
> 
> I hope this help...
> 
> PD: sorry for my english
> 
> 
> Nicolas GUIOT escribió:
> > Hi all,
> >
> > I'm trying to setup condor to submit MPI jobs. If I understood correctly, I need to first setup a dedicated scheduler.
> > I then checked the example "condor_config.local.dedicated.submit" file, but eveything is commented, so eventually I have "nothing" in this file (see attach.) 
> >
> > I found this page : (http://www.openems.org/display/CONDOR/Ask+Mike ---> Optena), which says I should add something like :
> > DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxxx" 
> > STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler
> >
> > in the local condor_config file.
> >
> > So, which of these solution is the right one ? a mix of both ? so why is the example file empty ?
> >
> > For now, I didn't change anything on the dedicated scheduler config file, and added (and modified) the config file for one machine I wanted to use as dedicated resource :
> >  
> > #################################
> > # Start only as EXECUTE machine
> > DAEMON_LIST = MASTER, STARTD
> > ##### Changes so that we don't care of KeyboardIdle
> > START                   = ( $(CPUIdle) || (State != "Unclaimed" && \
> >                                 State !="Owner") )
> > WANT_SUSPEND            = ( $(SmallJob) || $(IsPVM) || $(IsVanilla) )
> >
> > SUSPEND                 = ( (CpuBusyTime > 2 * $(MINUTE)) \
> >                                 && $(ActivationTimer) > 90 )
> >
> > CONTINUE                = ( $(CPUIdle) && ($(ActivityTimer) > 10) )
> >
> > ##  condor_config.local.dedicated.resource
> > DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxxxxx"
> > ## 3) Always run dedicated jobs, but only allow non-dedicated jobs to
> > ##    run on an opportunistic basis.
> > SUSPEND = Scheduler =!= $(DedicatedScheduler) && ($(SUSPEND))
> > PREEMPT = Scheduler =!= $(DedicatedScheduler) && ($(PREEMPT))
> > #RANK_FACTOR    = 1000000
> > RANK_FACTOR     = 100
> > RANK    = (Scheduler =?= $(DedicatedScheduler) * $(RANK_FACTOR)) + $(RANK)
> > START   = (Scheduler =?= $(DedicatedScheduler)) || ($(START))
> >
> > MPI_CONDOR_RSH_PATH = $(LIBEXEC)
> > CONDOR_SSHD = /usr/sbin/sshd
> > CONDOR_SSH_KEYGEN = /usr/bin/ssh-keygen
> > STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler
> >
> > #################
> >
> > if I "ps ax|grep condor" this dedicated resource, I don't see any startd running (that I usually see on execute machines...) : 
> > $ ps ax|grep cond
> > 24677 ?		Ss 	0:00 /nfs/opt/condor_x86_64/sbin/condor_master 
> > 24843 pts/0	S+	0:00 grep cond
> >
> > And this "dedicated resource" just disappeared from my "condor_status" list
> >
> > Any idea to solve that ?
> > I'm using condor 6.8.3
> >
> > Thanks for your help
> >
> > Nicolas

----------------------------------------------------
CNRS - UPR 9080 : Laboratoire de Biochimie Theorique
Institut de Biologie Physico-Chimique
13 rue Pierre et Marie Curie
75005 PARIS - FRANCE

Tel : +33 158 41 51 70
Fax : +33 158 41 50 26
----------------------------------------------------