[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] MPICH2 wrapper script (mpich2script) for parallel universe



That's perfect...I had also modified your first script and it was working, but it was taking a very long time (>240sec) to get the mpd ring up and running in all the machines in my cluster.. My solution was also a little bit clumsly as I was starting mpd on the master first (with a hard coded port number), and then would start mpd in other machines (hard coding port number also) to join the mpd ring of the master.. Your solution looks much cleaner, will try it 2nite.. I had earlier tried to use mpdboot to fireup the ring, guess it would'nt have worked because I did'nt know about the bug in MPICH-2...

Thanks a lot Mark... Will give you feedback 2morrow morning...

On 3/28/07, Mark Calleja <M.Calleja@xxxxxxxxxxxxxxx > wrote:
Hi Nkwebi,

I've modified my version of mp2script which now works with the "cpi"
example code that MPICH2 builds when run over multiple SMP machines.
Fancy testing it and reporting any feedback? One thing to note: there
seems to be a bug with the current version of MPICH2 (v1.0.5p3), at
least certainly when using the ch3:nemesis device, maybe even the
ch3:ssm. The fix requires changing the following line in mpiexec.py (at
around line 789):

Change this line:
    msgToMPD['ifhns'][loRange] = ifhn
to this:
    if ifhn:  msgToMPD['ifhns'][loRange] = ifhn

Thanks to Ralph Butler at MTSU for pointing this out, without which I
couldn't get mp2script to work.

Cheers,
Mark


Nkwebi Peace Motlogelwa wrote:
> Thanks Mark for the script...I just tried it and it works fine if the
> parallel program is  executed on a single dedicated node.. The script
> starts mpd on the master node (rank==0 or $_CONDOR_PROCNO ==0) only,
> and if one has many dedicated execute nodes, the script does not start
> mpd on the other nodes.. Will try to modify it to get it to start mpd
> on all dedicated execute nodes..so far tried using mpdboot, but it
> seems not as straight forward  to get the ring of mpd's working..
>
> regards..
>
> On 3/16/07, *Mark Calleja* <M.Calleja@xxxxxxxxxxxxxxx
> <mailto:M.Calleja@xxxxxxxxxxxxxxx >> wrote:
>
>     Hi Nkwebi,
>
>     I don't know if you still need this, but you can get my copy of
>     mp2script at:
>
>     http://www.escience.cam.ac.uk/~mcal00/condor/mp2script.asc
>     <http://www.escience.cam.ac.uk/%7Emcal00/condor/mp2script.asc>
>
>     Copy and paste it, and rename it as mp2script. A couple of points you
>     should bear in mind: I had to put a .mpd.conf file in the home
>     directory
>     of the user running condor (I use dedicated condor user accounts),
>     but I
>     also had to set the env var MPD_CONF_FILE in the script, otherwise mpd
>     failed to find the file. I also load LD_LIBRARY_PATH with the compiler
>     libs I used to build mpich2 (I used ifort/icc 9.1). This script works
>     fine with the "cpi" example that gets built by mpich2 in
>     /path/to/mpich2/distro/examples.
>
>     Cheers,
>     Mark
>
>     Nkwebi Peace Motlogelwa wrote:
>     > Hi all... I need a working MPICH2 wrapper script for condor's
>     > parallel universe...I use condor-6.8.4, but it comes with wrapper
>     > scripts for LAM and MPICH1 only.. I tried to modify the
>     > mpich1script, but not winning so far... anybody using condor
>     > and mpich2 and willing to share their wrapper scripts?...Pls help..
>     _______________________________________________
>     Condor-users mailing list
>     To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
>     <mailto: condor-users-request@xxxxxxxxxxx> with a
>     subject: Unsubscribe
>     You can also unsubscribe by visiting
>     https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>     The archives can be found at either
>     https://lists.cs.wisc.edu/archive/condor-users/
>     http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at either
> https://lists.cs.wisc.edu/archive/condor-users/
> http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR