[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] THE MPI JOB ALWAYS IN "RUNNING"



Hello,zhaokun
You give me three advices,but I also have some puzzle
1.mpi can run well without condor
2.how to add some "echo ..." statement to trace errors?can you tell me in detail
3.as follows:
7/8 10:41:34 ******************************************************
7/8 10:41:34 ** condor_shadow (CONDOR_SHADOW) STARTING UP
7/8 10:41:34 ** /usr/local/src/condor/sbin/condor_shadow
7/8 10:41:34 ** $CondorVersion: 7.0.5 Sep 20 2008 BuildID: 105846 $
7/8 10:41:34 ** $CondorPlatform: I386-LINUX_RH9 $
7/8 10:41:34 ** PID = 6554
7/8 10:41:34 ** Log last touched 7/8 10:33:26
7/8 10:41:34 ******************************************************
7/8 10:41:34 Using config source: /usr/local/src/condor/etc/condor_config
7/8 10:41:34 Using local config sources:
7/8 10:41:34 /usr/local/src/condor/local.node1/condor_config.local
7/8 10:41:34 DaemonCore: Command Socket at <192.168.0.101:33644>
7/8 10:41:34 Initializing a PARALLEL shadow for job 44.0
7/8 10:41:35 (44.0) (6554): Request to run on <192.168.0.116:33302> was ACCEPTED
7/8 10:41:35 (44.0) (6554): Request to run on <192.168.0.101:32793> was ACCEPTED

7/8 10:41:35 ******************************************************
7/8 10:41:35 ** condor_starter (CONDOR_STARTER) STARTING UP
7/8 10:41:35 ** /usr/local/src/condor/sbin/condor_starter
7/8 10:41:35 ** $CondorVersion: 7.0.5 Sep 20 2008 BuildID: 105846 $
7/8 10:41:35 ** $CondorPlatform: I386-LINUX_RH9 $
7/8 10:41:35 ** PID = 6555
7/8 10:41:35 ** Log last touched 7/8 10:32:56
7/8 10:41:35 ******************************************************
7/8 10:41:35 Using config source: /usr/local/src/condor/etc/condor_config
7/8 10:41:35 Using local config sources:
7/8 10:41:35 /usr/local/src/condor/local.node1/condor_config.local
7/8 10:41:35 DaemonCore: Command Socket at <192.168.0.101:33651>
7/8 10:41:35 Done setting resource limits
7/8 10:41:36 Communicating with shadow <192.168.0.101:33644>
7/8 10:41:36 Submitting machine is "node1.localdomain"
7/8 10:41:36 setting the orig job name in starter
7/8 10:41:36 setting the orig job iwd in starter
7/8 10:41:36 Job has WantIOProxy=true
7/8 10:41:36 Initialized IO Proxy.
7/8 10:41:36 File transfer completed successfully.
7/8 10:41:37 Job 44.0 set to execute immediately
7/8 10:41:37 Starting a PARALLEL universe job with ID: 44.0
7/8 10:41:37 IWD: /usr/local/src/condor/local.node1/execute/dir_6555
7/8 10:41:37 Output file: /usr/local/src/condor/local.node1/execute/dir_6555/hello.out
7/8 10:41:37 Error file: /usr/local/src/condor/local.node1/execute/dir_6555/hello.err
7/8 10:41:37 About to exec /usr/local/src/condor/local.node1/execute/dir_6555/condor_exec.exe hello 2
7/8 10:41:37 Create_Process succeeded, pid=6557
7/8 10:41:37 IOProxy: accepting connection from 192.168.0.101
7/8 10:41:37 IOProxyHandler: closing connection to 192.168.0.101

what is wrong with it?
I really need a help!
Any help will be appraciated.
Regards,
Han

--- 09年7月8日,周三, zhaokun <zhaokun@xxxxxxxxxxxxx> 写道:

> 发件人: zhaokun <zhaokun@xxxxxxxxxxxxx>
> 主题: Re: [Condor-users] THE MPI JOB ALWAYS IN "RUNNING"
> 收件人: "Condor-Users Mail List" <condor-users@xxxxxxxxxxx>
> 日期: 2009年7月8日,周三,上午10:55
> Hi Condor-Users Mail List,
>
>    Sorry to reply so late.
>    
>    1. check you mpi settings
>    2. add some "echo ..." statement to trace
> errors.
>    3. view log files to get more info.
> SchedLog,StartLog,StarterLog ...
> ------------------       
>          
>            
>    zhaokun
>            
>     2009-07-08
>
> -------------------------------------------------------------
> From:Hehe cmesunoom@xxxxxxxx
> Date:2009-07-07 09:36:01
> To:Condor-Users Mail List condor-users@xxxxxxxxxxx
> cc:
> Title:Re: [Condor-users] THE MPI JOB ALWAYS IN "RUNNING"
>
> hello,zhaokun
> my mpi job submit description file is as followed:
> universe=parallel
> executable=/usr/local/condor/etc/examples/mp1script
> arguments=hello
> log=hello.log
> output=hello.out
> error=hello.err
> machine_count=2
> should_transfer_files=yes
> when_to_transfer_output=on_exit
> transfer_input_files=hello
> queue
>  
> that is all,does it have any problem?
> thanks in advance.
> Han.(你是中国人吧?方便的话可以直接用汉语交流吗?我的英语很糟粕)
>
> --- 09年7月7日,周二, zhaokun <zhaokun@xxxxxxxxxxxxx>
> 写道:
>
>
> 发件人: zhaokun <zhaokun@xxxxxxxxxxxxx>
> 主题: Re: [Condor-users] THE MPI JOB ALWAYS IN "RUNNING"
> 收件人: "Condor-Users Mail List" <condor-users@xxxxxxxxxxxx>
> 日期: 2009年7月7日,周二,上午9:15
>
>
> Hi Condor-Users Mail List,
>
>     Please attach your job script file to find the
> reason.
> ------------------                 
>                zhaokun
>                 2009-07-07
>
> -------------------------------------------------------------
> From:Hehe cmesunoom@xxxxxxxx
> Date:2009-07-06 18:47:50
> To:condor-users condor-users@xxxxxxxxxxx
> cc:
> Title:[Condor-users] THE MPI JOB ALWAYS IN "RUNNING"
>
> hello,all
> when I submit mpi job on condor,the job stay in the state
> "running" all the time
>  
> ************hello_log  file***************
> Job submitted from host:<.......>
> Node 0 executing on host:<........>
> Job executing on host:MPI_job
>  
> so I want to know the reason for it
>
> Any help will be appraciated.
> Regards,
> Han
>
>
>      
> ___________________________________________________________
>
>   好玩贺卡等你发,邮箱贺卡全新上线!
> http://card.mail.cn.yahoo.com/
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users

>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/

>
>
>
>      
> ___________________________________________________________
>
>   好玩贺卡等你发,邮箱贺卡全新上线!
>
> http://card.mail.cn.yahoo.com/

> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users

>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/

> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>


好玩贺卡等你发,邮箱贺卡全新上线!