[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] New to Condor, Need to RUN MPI



Hi Todd
As per your suggestion i just changed the MPDIR

---------------------------
# Set this to the bin directory of MPICH installation
MPDIR=/opt/mpich/gnu/bin
PATH=$MPDIR:.:$PATH
export PATH

export P4_RSHCOMMAND=$CONDOR_SSH

CONDOR_CONTACT_FILE=$_CONDOR_SCRATCH_DIR/contact
export CONDOR_CONTACT_FILE

# The second field in the contact file is the machine name
# that condor_ssh knows how to use
sort -n +0 < $CONDOR_CONTACT_FILE | awk '{print $2}' > machines

## run the actual mpijob
mpirun -v -np $_CONDOR_NPROCS -machinefile machines $EXECUTABLE $@

--------------------

That strange message seems to go away but i still get the following

--------------------------
running /var/opt/condor/execute/dir_6084/bones on 2 LINUX ch_p4 processors
Cannot read machines.
Looked for files with extension LINUX in
directory /opt/mpich/gnu/share .
---------------------------
I check and there is a file called machines.LINUX in that DIR.

Thanks

Samir Khanal
CS Grad Student
Hayes 226
Bowling Green State University
Bowling Green, OH 43402
skhanal@xxxxxxxx

________________________________________
From: condor-users-bounces@xxxxxxxxxxx [condor-users-bounces@xxxxxxxxxxx] On Behalf Of Todd Tannenbaum [tannenba@xxxxxxxxxxx]
Sent: Friday, January 30, 2009 3:03 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] New to Condor, Need to RUN MPI

Samir Khanal wrote:
> I tried Parallel Universe too, here is what i get
[snip]
> running /home/skhanal/condor/bones on 2 LINUX ch_p4 processors
> Created /var/opt/condor/execute/dir_5352/PILxVizf5531
> Host compute-0-0 is not in contact file /var/opt/condor/execute/dir_5352/contact
> p0_5556:  p4_error: Child process exited while making connection to remote process on compute-0-0: 0
> p0_5556: (2.003906) net_send: could not write to fd=4, errno = 32
>
>
> The job does not complete successfully with above messages.
>
> Help ! Help!
>

Why did you feel compelled to hack the sample mp1script included with
Condor?  Are you trying to use mpich?  If so, just set the path
correctly (to MPDIR) in the sample script where the comment says so; no
other changes should be needed.

Your customizations to the sample mp1script look very suspect to me.

regards,
Todd


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/