[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor Using Parallel Universe



Hi sara,
 can you give me some details about your job?
 The Submit file and the Submit node's details (operating system, 32
or 64 bits, Memory)
 Generally when I get "reject the job for unknown reasons" is that
something is wrong in the submit or the job is asking for something
special and the nodes at the moment can give it.

 I'll wait for your answer.

Bye

On 9/23/11, Sara Rolfe <smrolfe@xxxxxxxxxxxxxxxx> wrote:
> Hi Edier,
>
> Thanks for your input.  I ran condor_q --better-analyze and got:
>
> 2067.000:  Run analysis summary.  Of 208 machines,
>        0 are rejected by your job's requirements
>        1 reject your job because of their own requirements
>        0 match but are serving users with a better priority in the pool
>      207 match but reject the job for unknown reasons
>        0 match but will not currently preempt their existing job
>        0 match but are currently offline
>        0 are available to run your job
>
> The following attributes are missing from the job ClassAd:
>
> CheckpointPlatform
>
> I will check with the system administrator about the directives you
> mentioned.  Let me know if the condor_q -better-analyze output gives
> you any insight.
>
> Thanks,
> Sara
>
> On Sep 22, 2011, at 5:26 PM, Edier Zapata wrote:
>
>> Hi sara,
>> can you run a condor_q --better-analyze?
>> Do you add this directives to the Manager's condor_config.local?
>>
>> -- PARALLEL DIRECTIVES FOR EXECUTE CENTRAL MANAGER WITH SUBMIT --
>> UNUSED_CLAIM_TIMEOUT = 0
>> MPI_CONDOR_RSH_PATH = \$(LIBEXEC)
>> ALTERNATE_STARTER_2 = \$(SBIN)/condor_starter
>> STARTER_2_IS_DC = TRUE
>> SHADOW_MPI = \$(SBIN)/condor_shadow
>>
>> And this to the Execute node's condor_config.local?
>> -- PARALLEL DIRECTIVES FOR EXECUTE NODE--
>> DedicatedScheduler = "DedicatedScheduler@YOUR_SCHEDULER'S_NAME"
>> STARTD_ATTRS = $(STARTD_ATTRS), DedicatedScheduler
>> SUSPEND	 = False
>> CONTINUE	 = True
>> PREEMPT	 = False
>> KILL		 = False
>> WANT_SUSPEND = False
>> WANT_VACATE	= False
>> RANK		 = Scheduler =?= \$(DedicatedScheduler)
>> MPI_CONDOR_RSH_PATH = \$(LIBEXEC)
>> CONDOR_SSHD = /usr/sbin/sshd
>> CONDOR_SSH_KEYGEN = /usr/bin/ssh-keygen
>> STARTD_EXPRS = \$(STARTD_EXPRS), DedicatedScheduler
>>
>> Hope this help you.
>> Bye
>>
>>
>> On 9/22/11, Sara Rolfe <smrolfe@xxxxxxxxxxxxxxxx> wrote:
>>> Hello,
>>>
>>> I'm trying to get a program to run using the parallel universe.  I've
>>> had no problems using the vanilla universe.  When I submit my
>>> parallel
>>> job, it hangs in idle.
>>>
>>> I've tried the "Sleep 30" example usign two machines from the manual,
>>> but this isn't working either.  When I get the run analysis summary
>>> it
>>> says:
>>>
>>> 2067.000:  Run analysis summary.  Of 208 machines,
>>>       0 are rejected by your job's requirements
>>>       2 reject your job because of their own requirements
>>>       0 match but are serving users with a better priority in the
>>> pool
>>>     206 match but reject the job for unknown reasons
>>>       0 match but will not currently preempt their existing job
>>>       0 match but are currently offline
>>>       0 are available to run your job
>>>
>>> Does anyone have ideas on how to debug this?
>>>
>>> Thanks,
>>> Sara
>>
>> --
>> Edier Alberto Zapata Hernández
>> Ingeniero de Sistemas
>> Universidad de Valle
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
>> with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>


-- 
Edier Alberto Zapata Hernández
Ingeniero de Sistemas
Universidad de Valle