[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] MPICH2 wrapper script (mpich2script) for parallel universe



Hi Mark,
Thanks for letting me know the fix, I appreciate.
I did add those three lines and updated mpd.conf
file(MPD_PORT_RANGE=50001:59999). please find the modified mpd.py. Looks
like I didn't properly added seems. I am getting this error. 

mpdboot_machine2 (handle_mpd_output 388): from mpd on machine1, invalid
port info:


mpdboot error : 255

Could you please let me know, is mpd.py has correct fix for port range.
Thanks,
Senthil
-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Mark Calleja
Sent: Tuesday, April 03, 2007 3:21 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] MPICH2 wrapper script (mpich2script) for
parallel universe

Hi Senthil,

The fix for this was provided by Ralph Butler at MTSU. It involves 
editing mpd.py and adding three lines, so a diff between the new and 
original files gives (using v1.0.5p3 of MPICH2):

141d140
<                                  'MPD_PORT_RANGE'       :  0,
150,151d148
<         if self.parmdb['MPD_PORT_RANGE']:
<             os.environ['MPICH_PORT_RANGE'] =
self.parmdb['MPD_PORT_RANGE']

After making this change, you will want to add the following in your 
~/.mpd.conf file on all hosts:

MPD_PORT_RANGE=50001:59999

This works in my tests.

Regards,
Mark

Natarajan, Senthil wrote:
> Hi Mark,
> Thanks for your mp2script.
>
> I was wondering do you know how to set the port range for mpd to start
> on other machines.
>
> In your script I added this,
> export MPICH_PORT_RANGE=50001:59999
>
> so the local mpd starts in the specified port range, but the mpd
started
> through mpdboot on remote machines are using random ports. How to
start
> the remote mpd also in the above specified port range. Because of the
> random port number the firewall blocks the connection.
>
> Thanks,
> Senthil
>
>
>
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Mark Calleja
> Sent: Wednesday, March 28, 2007 4:49 AM
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] MPICH2 wrapper script (mpich2script) for
> parallel universe
>
> Hi Nkwebi,
>
> I've modified my version of mp2script which now works with the "cpi" 
> example code that MPICH2 builds when run over multiple SMP machines. 
> Fancy testing it and reporting any feedback? One thing to note: there 
> seems to be a bug with the current version of MPICH2 (v1.0.5p3), at 
> least certainly when using the ch3:nemesis device, maybe even the 
> ch3:ssm. The fix requires changing the following line in mpiexec.py
(at 
> around line 789):
>
> Change this line:
>     msgToMPD['ifhns'][loRange] = ifhn
> to this:
>     if ifhn:  msgToMPD['ifhns'][loRange] = ifhn
>
> Thanks to Ralph Butler at MTSU for pointing this out, without which I 
> couldn't get mp2script to work.
>
> Cheers,
> Mark
>
>
> Nkwebi Peace Motlogelwa wrote:
>   
>> Thanks Mark for the script...I just tried it and it works fine if the

>> parallel program is  executed on a single dedicated node.. The script

>> starts mpd on the master node (rank==0 or $_CONDOR_PROCNO ==0) only, 
>> and if one has many dedicated execute nodes, the script does not
start
>>     
>
>   
>> mpd on the other nodes.. Will try to modify it to get it to start mpd

>> on all dedicated execute nodes..so far tried using mpdboot, but it 
>> seems not as straight forward  to get the ring of mpd's working..
>>
>> regards..
>>
>> On 3/16/07, *Mark Calleja* <M.Calleja@xxxxxxxxxxxxxxx 
>> <mailto:M.Calleja@xxxxxxxxxxxxxxx>> wrote:
>>
>>     Hi Nkwebi,
>>
>>     I don't know if you still need this, but you can get my copy of
>>     mp2script at:
>>
>>     http://www.escience.cam.ac.uk/~mcal00/condor/mp2script.asc
>>     <http://www.escience.cam.ac.uk/%7Emcal00/condor/mp2script.asc>
>>
>>     Copy and paste it, and rename it as mp2script. A couple of points
>>     
> you
>   
>>     should bear in mind: I had to put a .mpd.conf file in the home
>>     directory
>>     of the user running condor (I use dedicated condor user
accounts),
>>     but I
>>     also had to set the env var MPD_CONF_FILE in the script,
otherwise
>>     
> mpd
>   
>>     failed to find the file. I also load LD_LIBRARY_PATH with the
>>     
> compiler
>   
>>     libs I used to build mpich2 (I used ifort/icc 9.1). This script
>>     
> works
>   
>>     fine with the "cpi" example that gets built by mpich2 in
>>     /path/to/mpich2/distro/examples.
>>
>>     Cheers,
>>     Mark
>>
>>     Nkwebi Peace Motlogelwa wrote:
>>     > Hi all... I need a working MPICH2 wrapper script for condor's
>>     > parallel universe...I use condor-6.8.4, but it comes with
>>     
> wrapper
>   
>>     > scripts for LAM and MPICH1 only.. I tried to modify the
>>     > mpich1script, but not winning so far... anybody using condor
>>     > and mpich2 and willing to share their wrapper scripts?...Pls
>>     
> help..
>   
>>     _______________________________________________
>>     Condor-users mailing list
>>     To unsubscribe, send a message to
condor-users-request@xxxxxxxxxxx
>>     <mailto:condor-users-request@xxxxxxxxxxx> with a
>>     subject: Unsubscribe
>>     You can also unsubscribe by visiting
>>     https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>>     The archives can be found at either
>>     https://lists.cs.wisc.edu/archive/condor-users/
>>
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
>>
>>
>>
>>     
>
------------------------------------------------------------------------
>   
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
>>     
> with a
>   
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at either
>> https://lists.cs.wisc.edu/archive/condor-users/
>> http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
>>     
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with
> a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at either
> https://lists.cs.wisc.edu/archive/condor-users/
> http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at either
> https://lists.cs.wisc.edu/archive/condor-users/
> http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
>   

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR

Attachment: mpd.py
Description: mpd.py