[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor Using Parallel Universe



Hi sara,
 can you run a condor_q --better-analyze?
 Do you add this directives to the Manager's condor_config.local?

-- PARALLEL DIRECTIVES FOR EXECUTE CENTRAL MANAGER WITH SUBMIT --
 UNUSED_CLAIM_TIMEOUT = 0
 MPI_CONDOR_RSH_PATH = \$(LIBEXEC)
 ALTERNATE_STARTER_2 = \$(SBIN)/condor_starter
 STARTER_2_IS_DC = TRUE
 SHADOW_MPI = \$(SBIN)/condor_shadow

And this to the Execute node's condor_config.local?
-- PARALLEL DIRECTIVES FOR EXECUTE NODE--
 DedicatedScheduler = "DedicatedScheduler@YOUR_SCHEDULER'S_NAME"
 STARTD_ATTRS = $(STARTD_ATTRS), DedicatedScheduler
 SUSPEND	 = False
 CONTINUE	 = True
 PREEMPT	 = False
 KILL		 = False
 WANT_SUSPEND = False
 WANT_VACATE	= False
 RANK		 = Scheduler =?= \$(DedicatedScheduler)
 MPI_CONDOR_RSH_PATH = \$(LIBEXEC)
 CONDOR_SSHD = /usr/sbin/sshd
 CONDOR_SSH_KEYGEN = /usr/bin/ssh-keygen
 STARTD_EXPRS = \$(STARTD_EXPRS), DedicatedScheduler

Hope this help you.
 Bye


On 9/22/11, Sara Rolfe <smrolfe@xxxxxxxxxxxxxxxx> wrote:
> Hello,
>
> I'm trying to get a program to run using the parallel universe.  I've
> had no problems using the vanilla universe.  When I submit my parallel
> job, it hangs in idle.
>
> I've tried the "Sleep 30" example usign two machines from the manual,
> but this isn't working either.  When I get the run analysis summary it
> says:
>
> 2067.000:  Run analysis summary.  Of 208 machines,
>        0 are rejected by your job's requirements
>        2 reject your job because of their own requirements
>        0 match but are serving users with a better priority in the pool
>      206 match but reject the job for unknown reasons
>        0 match but will not currently preempt their existing job
>        0 match but are currently offline
>        0 are available to run your job
>
> Does anyone have ideas on how to debug this?
>
> Thanks,
> Sara

-- 
Edier Alberto Zapata Hernández
Ingeniero de Sistemas
Universidad de Valle