[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Problem when machine_count > 1 in MPI Universe

Hi All,

I am trying to run CPI executable that comes with MPI installation on condor pool of 4 machines. I have setup all pre-requisites required for MPI job to run like funda of "dedicated scheduler". But to my surprise, Job runs well till machine_count is 1. If we increase machine_count, it fails giving error like
rm_3948: (-) net_recv failed for fd = 3
rm_3948:  p4_error: net_recv read, errno = : 104

I have gone through all the previous mails on this particular issue, but still i am facing the same.

Please help me out