[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Parallel Universe - Machine count



Hi Sarnath:

I wrote a wrapper awhile ago which let us pretty easily run MPI/MPICH-2
(MVAPICH2) jobs on our cluster here (all RHEL 64-bit). I figured it was
time to share it with the Condor community. 

I have tested the wrapper with the 7.6-7.8 series on RHEL 64-bit. If you
are running on 32-bit, you will need the chirp libraries from your
condor installation to compile the wrapper (just replace them in the
Makefile).

Basically, on clusters with a non-shared (e.g. non-NFS) file system it
sets up a "fake" file system in /tmp with the same name across all of
the nodes so that the underlying MPI executable assumes it is shared. On
clusters with underlying shared file systems it still sets a number of
important environment variables that can be used with mpiexec.

It also does all of the management of host files, etc. At this point,
however, you will need have ssh-keys set up on all of the nodes so that
you can ssh in without a password since it does not do ssh-keygen
management.

All of the source code as well as an example (in the README) can be
found at <https://github.com/herzfeldd/parallel_wrapper>.

I would be happy to re-release this under the Condor License agreement
should the developers wish to include any portion of it in future
releases.

Cheers,
DJH

On Sun, 2012-07-15 at 09:48 -0400, Michael Grauer wrote:
> Hi Sarnath,
> 
> 
> We had some success with Condor + mpich2, but only for Windows.  Here
> are some notes from our work, I hope this helps you get started.
> 
> 
> http://www.itk.org/Wiki/Proposals:Condor#MPICH2_on_Windows
> 
> 
> -Mike
> 
> On Sun, Jul 15, 2012 at 9:09 AM, Sarnath K - ERS, HCLTech
> <k_sarnath@xxxxxxx> wrote:
>         Hi,
>         
>         I am re-sending the same e-mail in Plain Text format. May be,
>         HTML is the reason why my e-mails are usually un-answered....
>         
>         1.    How do I figure out the “machine count” for a Parallel
>         Universe job so that the Job runs without waiting for
>         resources?
>         
>         2.    Say I have 2 machines with 1 Core Each
>                 a.    I want to run 2 MPI Jobs on both cores
>         simultaneously.
>                 b.    Can “Dynamic Slots” help?
>         
>         Thanks,
>         Best Regards,
>         Sarnath
>         
>         
>         
>         ::DISCLAIMER::
>         ----------------------------------------------------------------------------------------------------------------------------------------------------
>         
>         The contents of this e-mail and any attachment(s) are
>         confidential and intended for the named recipient(s) only.
>         E-mail transmission is not guaranteed to be secure or
>         error-free as information could be intercepted, corrupted,
>         lost, destroyed, arrive late or incomplete, or may contain
>         viruses in transmission. The e mail and its contents
>         (with or without referred errors) shall therefore not attach
>         any liability on the originator or HCL or its affiliates.
>         Views or opinions, if any, presented in this email are solely
>         those of the author and may not necessarily reflect the
>         views or opinions of HCL or its affiliates. Any form of
>         reproduction, dissemination, copying, disclosure,
>         modification,
>         distribution and / or publication of this message without the
>         prior written consent of authorized representative of
>         HCL is strictly prohibited. If you have received this email in
>         error please delete it and notify the sender immediately.
>         Before opening any email and/or attachments, please check them
>         for viruses and other defects.
>         
>         ----------------------------------------------------------------------------------------------------------------------------------------------------
>         
>         
>         _______________________________________________
>         Condor-users mailing list
>         To unsubscribe, send a message to
>         condor-users-request@xxxxxxxxxxx with a
>         subject: Unsubscribe
>         You can also unsubscribe by visiting
>         https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>         
>         The archives can be found at:
>         https://lists.cs.wisc.edu/archive/condor-users/
> 
> 
> 
> 
> -- 
> Thanks,
> Michael Grauer
> R & D Engineer
> Kitware, Inc.
> 919 969 6990 x322
> 
> 
> 
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/