Hello,
I'm trying to run an mpi job on my windows grid. The head node is now a windows machine, and all nodes are windows machines. The node i'm submitting from is also a
"Dedicated Scheduler"
as defined in : http://www.cs.wisc.edu/condor/manual/v6.6.5/3_10Setting_Up.html#sec:Config-Dedicated-Jobs
Everything works fine, and it gets out of the job queue and onto one of
the
node for execution, however it just stays there and doesn't ever leave...just keeps on "Busy": vm2@xxxxxxxxx WINNT51 INTEL Claimed Busy 1.060 255 0+00:03:41 There's nothing in the log, errorlog, or output. Here is what i do ... PLEASE HELP!! Jon > qsub mpi.sub ====== mpi.sub ====== universe = MPI executable = runMPIHello.bat log = logfile output = outfile error = errfile machine_count = 2 should_transfer_files = YES when_to_transfer_output = ON_EXIT getenv = true queue ===== runMPIHello.bat ===== "C:\Program Files\MPICH\mpd\bin\mpirun" -np 2 -machinefile "C:\mpiJava\examples\simple\machinefile" "C:\mpiJava\examples\simple\runHello.bat" ===== runMPIHello.bat ===== java -Djava.library.path=C:\WINDOWS\SYSTEM32 -cp .;c:/mpiJava/lib/classes Hello ================
addition to condor_config
================
######################################################################
###################################################################### ## Settings you MUST customize! ###################################################################### ###################################################################### ## What is the name of the dedicated
scheduler for this resource?
## You MUST fill in the correct full hostname where you're running ## the dedicated scheduler, and where users will submit their ## dedicated jobs. The "DedicateScheduler@" part should not be ## changed, ONLY the hostname. DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxx" ######################################################################
###################################################################### ## Policy Settings (You MUST choose a policy and uncomment it) ###################################################################### ###################################################################### ## There are three basic options for the
policy on dedicated
## resources: ## 1) Only run dedicated jobs ## 2) Always run jobs, but prefer dedicated ones ## 3) Always run dedicated jobs, but only allow non-dedicated jobs to ## run on an opportunistic basis. ## You MUST uncomment the set of policy expressions you want to use ## at your site. ##--------------------------------------------------------------------
## 1) Only run dedicated jobs ##-------------------------------------------------------------------- #START = Scheduler =?= $(DedicatedScheduler) #SUSPEND = False #CONTINUE = True #PREEMPT = False #KILL = False #WANT_SUSPEND = False #WANT_VACATE = False #RANK = Scheduler =?= $(DedicatedScheduler) ##--------------------------------------------------------------------
## 2) Always run jobs, but prefer dedicated ones ##-------------------------------------------------------------------- START = True SUSPEND = False CONTINUE = True PREEMPT = False KILL = False WANT_SUSPEND = False WANT_VACATE = False RANK = 200000 ##--------------------------------------------------------------------
## 3) Always run dedicated jobs, but only allow non-dedicated jobs to ## run on an opportunistic basis. ##-------------------------------------------------------------------- ## Allowing both dedicated and opportunistic jobs on your resources ## requires that you have an opportunistic policy already defined. ## These are the only settings that need to be modified from your ## existing policy expressions to allow dedicated jobs to always run ## without suspending, or ever being preempted (either from activity ## on the machine, or other jobs in the system). #SUSPEND = Scheduler =!= $(DedicatedScheduler)
&& ($(SUSPEND))
#PREEMPT = Scheduler =!= $(DedicatedScheduler) && ($(PREEMPT)) #RANK_FACTOR = 1000000 #RANK = (Scheduler =?= $(DedicatedScheduler) * $(RANK_FACTOR)) + $(RANK) #START = (Scheduler =?= $(DedicatedScheduler)) || ($(START)) ## Note: For everything to work, you MUST set
RANK_FACTOR to be a
## larger value than the maximum value your existing rank _expression_ ## could possibly evaluate to. RANK is just a floating point value, ## so there's no harm in having a value that's very large. ###################################################################### ###################################################################### ## Settings you should leave alone, but that must be defined ###################################################################### ###################################################################### ## Path to the special version of rsh that's required to spawn
MPI
## jobs under Condor. WARNING: This is not a replacement for rsh, ## and does NOT work for interactive use. Do not use it directly! MPI_CONDOR_RSH_PATH = $(SBIN) ## This setting puts the DedicatedScheduler attribute, defined
above,
## into your machine's classad. This way, the dedicated scheduler ## (and you) can identify which machines are configured as dedicated ## resources. STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler |