[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor and MPICH on RH9



On Mon, Mar 21, 2005 at 08:09:32AM +0100, Fran?ois Bachmann wrote:
> Hi all
> 
> while trying to set up a MPI universe (MPICH 1.2.4) on our RH9 based
> cluster, we've come across the following problem:
> 
> Condor seems to rely on sending mails between the Master and machines
> in its pool. Now when we try to connect to a pool machine from the
> dedicated submitter (which is also the Master) by rsh, we get a
> standard RH9 message "You have mail" which throws MPICH off (a message
> along the lines of "Weird 0xaxa..."). MPI's tstmachines therefore
> gives us errors.
> 
> I'm trying to find out how to:
> a) turn the "You have mail" message off for non-interactive RH9 sessions or
> b) use Condor/MPI without mail
> 
> Am I going down the right road here ?

No, you're not. The 'rsh' in Condor 6.6.x is not a real rsh. The 
'You have mail' message never gets sent to MPICH. If you're having
trouble with MPI universe jobs, tstmachines is not the way to debug it,
we don't use it. 

-Erik