[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] MPI jobs can not be run on Condor 6.8



Dear tracy_luofengji,

	You can modify mp1script  by adding  some "echo ..." lines to find more information in output file.


			

	Thanks.
      	 Zhaokun
			   Beijing Hotsim Technology Co.,Ltd
			   zhaokun@xxxxxxxxxxxxx
          2009-03-11
=======From 2009-03-11 16:52:11 =======

>Dear Zhao Kun,
>Thanks for you help. I forgot to uncomment some sentences in the condor_config.local, and now the MPI jobs can be executed. But now the problem is, all the mpi jobs stay in the state "running" all the time, and never finish or return. I have spent several hours on it, but I can not find the reason.
> 
>Any help will be appraciated.
>Regards,
>Tracy
>
>
>
>
>在2009-03-11?12:33:43,zhaokun?<zhaokun@xxxxxxxxxxxxx>?写道:
>>Dear?tracy_luofengji,
>>
>>	
>>????In?submit?file,?remove?3?lines.?Make?sure?mp1script?and?helloworld?can?be?found?in?your?working?machine.
>>
>>>>>should_transfer_files?=?yes
>>>>>when_to_transfer_output?=?on_exit
>>>>>transfer_input_files?=?/usr/local/helloworld
>>
>>	
>>
>>			
>>
>>	Thanks.
>>      	 Zhaokun
>>			???Beijing?Hotsim?Technology?Co.,Ltd
>>			???zhaokun@xxxxxxxxxxxxx
>>          2009-03-11
>>=======From?2009-03-11?11:45:17?=======
>>
>>>Dear?Zhao?Kun,
>>>?
>>>Hello,?thanks?for?your?help.?I?have?tested?it?again?following?your?suggestion.?Only?one?sentence?like?"job?has?been?submitted..."?in?the?log?file,?and?no?information?in?the?output?file?and?the?error?file.
>>>?
>>>I?have?already?used?condor_status?and?condor_q?-analyze?before.?The?status?of?my?worker?node?kept?"Unclaimed"?and?the?result?of?"condor_q?-analyze"?just?told?me:"1?job?is?rejected?by?unknown?reasons".
>>>?
>>>Do?you?have?any?suggestions??Any?help?will?be?appreciated.
>>>?
>>>Thanks!
>>>Tracy
>>>
>>>
>>>
>>>
>>>在2009-03-11?11:37:22,zhaokun?<zhaokun@xxxxxxxxxxxxx>?写道:
>>>>Dear?tracy_luofengji,
>>>>
>>>>	Please?add?following?lines?in?your?submit?file
>>>>????
>>>>	Log?=?test.log
>>>>	Output?=?test.out
>>>>	Error?=??test.err
>>>>
>>>>????you?may?find?what?happens?in?log?file?and?output?file?after?job?submited.
>>>>
>>>>?	"condor_status"?and?"condor_q?-ana"?will?help?you?to?get?more?info.
>>>>
>>>>
>>>>	
>>>>
>>>>
>>>>			
>>>>
>>>>	Thanks.
>>>>      	 Zhaokun
>>>>			???Beijing?Hotsim?Technology?Co.,Ltd
>>>>			???zhaokun@xxxxxxxxxxxxx
>>>>          2009-03-11
>>>>=======From?2009-03-11?11:29:21?=======
>>>>
>>>>>Dear?all,
>>>>>I?used?Condor?6.8?and?mpich-1.2.7.?I?tested?it?on?2?nodes:?one?acts?as?master?and?the?other?acts?as?worker.?On?the?worker?node,?I?copied?the?content?of?the?file?$CONDOR_HOME/etc/examples/condor_config.local.dedicated.resource?to?the?file?/home/condor/condor_config.local,?and?changed?the?dedicated?schedular?to:
>>>>>?
>>>>>DedicatedScheduler?=?"DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxx"
>>>>>?
>>>>>Then?on?the?master?node,?I?created?a?submission?file?as?following:
>>>>>
>>>>>universe?=?parallel
>>>>>executable?=?/usr/local/mp1script
>>>>>arguments?=?/usr/local/helloworld
>>>>>machine_count?=?1
>>>>>should_transfer_files?=?yes
>>>>>when_to_transfer_output?=?on_exit
>>>>>transfer_input_files?=?/usr/local/helloworld
>>>>>queue
>>>>>?
>>>>>When?I?submitted?the?job?to?condor,?the?job?always?kept?idle,?so?I?want?to?know?the?reason?for?it.
>>>>>?
>>>>>Thanks!
>>>>>Regards,
>>>>>Tracy
>>>>>_______________________________________________
>>>>>Condor-users?mailing?list
>>>>>To?unsubscribe,?send?a?message?to?condor-users-request@xxxxxxxxxxx?with?a
>>>>>subject:?Unsubscribe
>>>>>You?can?also?unsubscribe?by?visiting
>>>>>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>>
>>>>>The?archives?can?be?found?at:?
>>>>>https://lists.cs.wisc.edu/archive/condor-users/
>>>>>
>>>>
>>>>=?=?=?=?=?=?=?=?=?=?=?=?=?=?=?=?=?=?=?=
>>>>_______________________________________________
>>>>Condor-users?mailing?list
>>>>To?unsubscribe,?send?a?message?to?condor-users-request@xxxxxxxxxxx?with?a
>>>>subject:?Unsubscribe
>>>>You?can?also?unsubscribe?by?visiting
>>>>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>
>>>>The?archives?can?be?found?at:?
>>>>https://lists.cs.wisc.edu/archive/condor-users/
>>>
>>>_______________________________________________
>>>Condor-users?mailing?list
>>>To?unsubscribe,?send?a?message?to?condor-users-request@xxxxxxxxxxx?with?a
>>>subject:?Unsubscribe
>>>You?can?also?unsubscribe?by?visiting
>>>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>
>>>The?archives?can?be?found?at:?
>>>https://lists.cs.wisc.edu/archive/condor-users/
>>>
>>
>>=?=?=?=?=?=?=?=?=?=?=?=?=?=?=?=?=?=?=?=
>>_______________________________________________
>>Condor-users?mailing?list
>>To?unsubscribe,?send?a?message?to?condor-users-request@xxxxxxxxxxx?with?a
>>subject:?Unsubscribe
>>You?can?also?unsubscribe?by?visiting
>>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>>The?archives?can?be?found?at:?
>>https://lists.cs.wisc.edu/archive/condor-users/
>
>_______________________________________________
>Condor-users mailing list
>To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>subject: Unsubscribe
>You can also unsubscribe by visiting
>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>The archives can be found at: 
>https://lists.cs.wisc.edu/archive/condor-users/
>

= = = = = = = = = = = = = = = = = = = =