[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Any simple way to configure HTConder under Windows 7 64bit+Intel MPI?



Dear all,

I have a cluster with local IP addresses as 192.168.1.1~192.168.1.10 (node name: N01~N10) and every node has the Windows 7 64bit installed. I built the program by VS2010 (C++)+Intel Fortran+Intel MPI. Currently I launch my program by Intel MPI with the following command:
mpiexec -wdir Z:\ -hosts 10 n01 12 n02 12 n03 12 n04 12 n05 12 n06 12 n07 12 n08 12 n09 12 n10 12 -mapall Z:\test

Now the problem is that with the same parameters to program 'test', sometimes the program test is OK but sometimes it has the following error message:
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(659)......................:
MPID_Init(195).............................: channel initialization failed
MPIDI_CH3_Init(106)........................:
MPID_nem_tcp_post_init(344)................:
MPID_nem_newtcp_module_connpoll(3099)......:
recv_id_or_tmpvc_info_success_handler(1328): read from socket failed - No error
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(659)................:
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................:
MPID_nem_tcp_post_init(344)..........:
MPID_nem_newtcp_module_connpoll(3099):
gen_read_fail_handler(1194)..........: read from socket failed - The specified n
etwork name is no longer available.

or the following error message:
*********** Warning ************
Unable to map \\n01\Debug. (error 71)
*********** Warning ************
launch failed: CreateProcess(\\n01\Debug\directional\fem) on 'N09' failed, error
 2 - The system cannot find the file specified.
launch failed: CreateProcess(\\n01\Debug\directional\fem) on 'N07' failed, error
 2 - The system cannot find the file specified.
launch failed: CreateProcess(\\n01\Debug\directional\fem) on 'N02' failed, error
 2 - The system cannot find the file specified.
*********** Warning ************
Unable to map \\n01\Debug. (error 71)

I don't know what could lead to these problems.

Can I solve this problem if I launch the program 'test' by HTConder under Windows 7 64bit+Intel MPI? Is there any simple method to quick setup the HTConder to let the program test work on my cluster with 120 processes?

Thanks,
Zhanghong Tang