[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] MPICH2 wrapper script (mpich2script) for parallel universe



Hi guys

I'm not on the subject right, busy doing something else, but I'm quietly following your conversation. 
Please keep on giving us the config files you're using.

Thank you.

Nicolas

----------------
On Wed, 04 Apr 2007 07:41:56 +0100
Mark Calleja <M.Calleja@xxxxxxxxxxxxxxx> wrote:

> Hi Senthil,
> 
> I'll forward you my copy of mpd.py offline.
> 
> Regards,
> Mark
> 
> Natarajan, Senthil wrote:
> > Hi Mark,
> > Thanks for letting me know the fix, I appreciate.
> > I did add those three lines and updated mpd.conf
> > file(MPD_PORT_RANGE=50001:59999). please find the modified mpd.py. Looks
> > like I didn't properly added seems. I am getting this error. 
> >
> > mpdboot_machine2 (handle_mpd_output 388): from mpd on machine1, invalid
> > port info:
> >
> >
> > mpdboot error : 255
> >
> > Could you please let me know, is mpd.py has correct fix for port range.
> > Thanks,
> > Senthil
> > -----Original Message-----
> > From: condor-users-bounces@xxxxxxxxxxx
> > [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Mark Calleja
> > Sent: Tuesday, April 03, 2007 3:21 AM
> > To: Condor-Users Mail List
> > Subject: Re: [Condor-users] MPICH2 wrapper script (mpich2script) for
> > parallel universe
> >
> > Hi Senthil,
> >
> > The fix for this was provided by Ralph Butler at MTSU. It involves 
> > editing mpd.py and adding three lines, so a diff between the new and 
> > original files gives (using v1.0.5p3 of MPICH2):
> >
> > 141d140
> > <                                  'MPD_PORT_RANGE'       :  0,
> > 150,151d148
> > <         if self.parmdb['MPD_PORT_RANGE']:
> > <             os.environ['MPICH_PORT_RANGE'] =
> > self.parmdb['MPD_PORT_RANGE']
> >
> > After making this change, you will want to add the following in your 
> > ~/.mpd.conf file on all hosts:
> >
> > MPD_PORT_RANGE=50001:59999
> >
> > This works in my tests.
> >
> > Regards,
> > Mark
> >
> > Natarajan, Senthil wrote:
> >   
> >> Hi Mark,
> >> Thanks for your mp2script.
> >>
> >> I was wondering do you know how to set the port range for mpd to start
> >> on other machines.
> >>
> >> In your script I added this,
> >> export MPICH_PORT_RANGE=50001:59999
> >>
> >> so the local mpd starts in the specified port range, but the mpd
> >>     
> > started
> >   
> >> through mpdboot on remote machines are using random ports. How to
> >>     
> > start
> >   
> >> the remote mpd also in the above specified port range. Because of the
> >> random port number the firewall blocks the connection.
> >>
> >> Thanks,
> >> Senthil
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: condor-users-bounces@xxxxxxxxxxx
> >> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Mark Calleja
> >> Sent: Wednesday, March 28, 2007 4:49 AM
> >> To: Condor-Users Mail List
> >> Subject: Re: [Condor-users] MPICH2 wrapper script (mpich2script) for
> >> parallel universe
> >>
> >> Hi Nkwebi,
> >>
> >> I've modified my version of mp2script which now works with the "cpi" 
> >> example code that MPICH2 builds when run over multiple SMP machines. 
> >> Fancy testing it and reporting any feedback? One thing to note: there 
> >> seems to be a bug with the current version of MPICH2 (v1.0.5p3), at 
> >> least certainly when using the ch3:nemesis device, maybe even the 
> >> ch3:ssm. The fix requires changing the following line in mpiexec.py
> >>     
> > (at 
> >   
> >> around line 789):
> >>
> >> Change this line:
> >>     msgToMPD['ifhns'][loRange] = ifhn
> >> to this:
> >>     if ifhn:  msgToMPD['ifhns'][loRange] = ifhn
> >>
> >> Thanks to Ralph Butler at MTSU for pointing this out, without which I 
> >> couldn't get mp2script to work.
> >>
> >> Cheers,
> >> Mark
> >>
> >>
> >> Nkwebi Peace Motlogelwa wrote:
> >>   
> >>     
> >>> Thanks Mark for the script...I just tried it and it works fine if the
> >>>       
> >
> >   
> >>> parallel program is  executed on a single dedicated node.. The script
> >>>       
> >
> >   
> >>> starts mpd on the master node (rank==0 or $_CONDOR_PROCNO ==0) only, 
> >>> and if one has many dedicated execute nodes, the script does not
> >>>       
> > start
> >   
> >>>     
> >>>       
> >>   
> >>     
> >>> mpd on the other nodes.. Will try to modify it to get it to start mpd
> >>>       
> >
> >   
> >>> on all dedicated execute nodes..so far tried using mpdboot, but it 
> >>> seems not as straight forward  to get the ring of mpd's working..
> >>>
> >>> regards..
> >>>
> >>> On 3/16/07, *Mark Calleja* <M.Calleja@xxxxxxxxxxxxxxx 
> >>> <mailto:M.Calleja@xxxxxxxxxxxxxxx>> wrote:
> >>>
> >>>     Hi Nkwebi,
> >>>
> >>>     I don't know if you still need this, but you can get my copy of
> >>>     mp2script at:
> >>>
> >>>     http://www.escience.cam.ac.uk/~mcal00/condor/mp2script.asc
> >>>     <http://www.escience.cam.ac.uk/%7Emcal00/condor/mp2script.asc>
> >>>
> >>>     Copy and paste it, and rename it as mp2script. A couple of points
> >>>     
> >>>       
> >> you
> >>   
> >>     
> >>>     should bear in mind: I had to put a .mpd.conf file in the home
> >>>     directory
> >>>     of the user running condor (I use dedicated condor user
> >>>       
> > accounts),
> >   
> >>>     but I
> >>>     also had to set the env var MPD_CONF_FILE in the script,
> >>>       
> > otherwise
> >   
> >>>     
> >>>       
> >> mpd
> >>   
> >>     
> >>>     failed to find the file. I also load LD_LIBRARY_PATH with the
> >>>     
> >>>       
> >> compiler
> >>   
> >>     
> >>>     libs I used to build mpich2 (I used ifort/icc 9.1). This script
> >>>     
> >>>       
> >> works
> >>   
> >>     
> >>>     fine with the "cpi" example that gets built by mpich2 in
> >>>     /path/to/mpich2/distro/examples.
> >>>
> >>>     Cheers,
> >>>     Mark
> >>>
> >>>     Nkwebi Peace Motlogelwa wrote:
> >>>     > Hi all... I need a working MPICH2 wrapper script for condor's
> >>>     > parallel universe...I use condor-6.8.4, but it comes with
> >>>     
> >>>       
> >> wrapper
> >>   
> >>     
> >>>     > scripts for LAM and MPICH1 only.. I tried to modify the
> >>>     > mpich1script, but not winning so far... anybody using condor
> >>>     > and mpich2 and willing to share their wrapper scripts?...Pls
> >>>     
> >>>       
> >> help..
> >>   
> >>     
> >>>     _______________________________________________
> >>>     Condor-users mailing list
> >>>     To unsubscribe, send a message to
> >>>       
> > condor-users-request@xxxxxxxxxxx
> >   
> >>>     <mailto:condor-users-request@xxxxxxxxxxx> with a
> >>>     subject: Unsubscribe
> >>>     You can also unsubscribe by visiting
> >>>     https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >>>
> >>>     The archives can be found at either
> >>>     https://lists.cs.wisc.edu/archive/condor-users/
> >>>
> >>>       
> > http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
> >   
> >>>
> >>>     
> >>>       
> > ------------------------------------------------------------------------
> >   
> >>   
> >>     
> >>> _______________________________________________
> >>> Condor-users mailing list
> >>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
> >>>     
> >>>       
> >> with a
> >>   
> >>     
> >>> subject: Unsubscribe
> >>> You can also unsubscribe by visiting
> >>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >>>
> >>> The archives can be found at either
> >>> https://lists.cs.wisc.edu/archive/condor-users/
> >>> http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
> >>>     
> >>>       
> >> _______________________________________________
> >> Condor-users mailing list
> >> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
> >>     
> > with
> >   
> >> a
> >> subject: Unsubscribe
> >> You can also unsubscribe by visiting
> >> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >>
> >> The archives can be found at either
> >> https://lists.cs.wisc.edu/archive/condor-users/
> >> http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
> >>
> >> _______________________________________________
> >> Condor-users mailing list
> >> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
> >>     
> > with a
> >   
> >> subject: Unsubscribe
> >> You can also unsubscribe by visiting
> >> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >>
> >> The archives can be found at either
> >> https://lists.cs.wisc.edu/archive/condor-users/
> >> http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
> >>   
> >>     
> >
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
> > a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
> > The archives can be found at either
> > https://lists.cs.wisc.edu/archive/condor-users/
> > http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
> >   
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
> > The archives can be found at either
> > https://lists.cs.wisc.edu/archive/condor-users/
> > http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
> 
> 
> -- 
> Dr Mark Calleja
> Cambridge eScience Centre, University of Cambridge
> Centre for Mathematical Sciences, Wilberforce Road, Cambridge CB3 0WA
> Tel. (+44/0) 1223 765317, Fax  (+44/0) 1223 765900
> http://www.escience.cam.ac.uk/~mcal00
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at either
> https://lists.cs.wisc.edu/archive/condor-users/
> http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
> 

----------


----------------------------------------------------
CNRS - UPR 9080 : Laboratoire de Biochimie Theorique

Institut de Biologie Physico-Chimique
13 rue Pierre et Marie Curie
75005 PARIS - FRANCE

Tel : +33 158 41 51 70
Fax : +33 158 41 50 26
----------------------------------------------------