[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] MPI jobs can not be run on Condor 6.8



Dear Zhao Kun,
Thanks for you help. I forgot to uncomment some sentences in the condor_config.local, and now the MPI jobs can be executed. But now the problem is, all the mpi jobs stay in the state "running" all the time, and never finish or return. I have spent several hours on it, but I can not find the reason.
 
Any help will be appraciated.
Regards,
Tracy


在2009-03-11 12:33:43,zhaokun <zhaokun@xxxxxxxxxxxxx> 写道:
>Dear tracy_luofengji,
>
>	
>    In submit file, remove 3 lines. Make sure mp1script and helloworld can be found in your working machine.
>
>>>>should_transfer_files?=?yes
>>>>when_to_transfer_output?=?on_exit
>>>>transfer_input_files?=?/usr/local/helloworld
>
>	
>
>			
>
>	Thanks.
>      	 Zhaokun
>			   Beijing Hotsim Technology Co.,Ltd
>			   zhaokun@xxxxxxxxxxxxx
>          2009-03-11
>=======From 2009-03-11 11:45:17 =======
>
>>Dear Zhao Kun,
>> 
>>Hello, thanks for your help. I have tested it again following your suggestion. Only one sentence like "job has been submitted..." in the log file, and no information in the output file and the error file.
>> 
>>I have already used condor_status and condor_q -analyze before. The status of my worker node kept "Unclaimed" and the result of "condor_q -analyze" just told me:"1 job is rejected by unknown reasons".
>> 
>>Do you have any suggestions? Any help will be appreciated.
>> 
>>Thanks!
>>Tracy
>>
>>
>>
>>
>>在2009-03-11?11:37:22,zhaokun?<zhaokun@xxxxxxxxxxxxx>?写道:
>>>Dear?tracy_luofengji,
>>>
>>>	Please?add?following?lines?in?your?submit?file
>>>????
>>>	Log?=?test.log
>>>	Output?=?test.out
>>>	Error?=??test.err
>>>
>>>????you?may?find?what?happens?in?log?file?and?output?file?after?job?submited.
>>>
>>>?	"condor_status"?and?"condor_q?-ana"?will?help?you?to?get?more?info.
>>>
>>>
>>>	
>>>
>>>
>>>			
>>>
>>>	Thanks.
>>>      	 Zhaokun
>>>			???Beijing?Hotsim?Technology?Co.,Ltd
>>>			???zhaokun@xxxxxxxxxxxxx
>>>          2009-03-11
>>>=======From?2009-03-11?11:29:21?=======
>>>
>>>>Dear?all,
>>>>I?used?Condor?6.8?and?mpich-1.2.7.?I?tested?it?on?2?nodes:?one?acts?as?master?and?the?other?acts?as?worker.?On?the?worker?node,?I?copied?the?content?of?the?file?$CONDOR_HOME/etc/examples/condor_config.local.dedicated.resource?to?the?file?/home/condor/condor_config.local,?and?changed?the?dedicated?schedular?to:
>>>>?
>>>>DedicatedScheduler?=?"DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxx"
>>>>?
>>>>Then?on?the?master?node,?I?created?a?submission?file?as?following:
>>>>
>>>>universe?=?parallel
>>>>executable?=?/usr/local/mp1script
>>>>arguments?=?/usr/local/helloworld
>>>>machine_count?=?1
>>>>should_transfer_files?=?yes
>>>>when_to_transfer_output?=?on_exit
>>>>transfer_input_files?=?/usr/local/helloworld
>>>>queue
>>>>?
>>>>When?I?submitted?the?job?to?condor,?the?job?always?kept?idle,?so?I?want?to?know?the?reason?for?it.
>>>>?
>>>>Thanks!
>>>>Regards,
>>>>Tracy
>>>>_______________________________________________
>>>>Condor-users?mailing?list
>>>>To?unsubscribe,?send?a?message?to?condor-users-request@xxxxxxxxxxx?with?a
>>>>subject:?Unsubscribe
>>>>You?can?also?unsubscribe?by?visiting
>>>>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>>
>>>>The?archives?can?be?found?at:?
>>>>https://lists.cs.wisc.edu/archive/condor-users/
>>>>
>>>
>>>=?=?=?=?=?=?=?=?=?=?=?=?=?=?=?=?=?=?=?=
>>>_______________________________________________
>>>Condor-users?mailing?list
>>>To?unsubscribe,?send?a?message?to?condor-users-request@xxxxxxxxxxx?with?a
>>>subject:?Unsubscribe
>>>You?can?also?unsubscribe?by?visiting
>>>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>
>>>The?archives?can?be?found?at:?
>>>https://lists.cs.wisc.edu/archive/condor-users/
>>
>>_______________________________________________
>>Condor-users mailing list
>>To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>subject: Unsubscribe
>>You can also unsubscribe by visiting
>>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>>The archives can be found at: 
>>https://lists.cs.wisc.edu/archive/condor-users/
>>
>
>= = = = = = = = = = = = = = = = = = = =
>_______________________________________________
>Condor-users mailing list
>To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>subject: Unsubscribe
>You can also unsubscribe by visiting
>https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>The archives can be found at: 
>https://lists.cs.wisc.edu/archive/condor-users/



网易邮箱,中国第一大电子邮件服务商