Re: [condor-users] LAM-MPI Support?

On Tue, 23 Sep 2003, Jess Cannata wrote:

> Does anyone know why Condor lacks support for LAM-MPI, or what is the
> main impediment to getting LAM working with Condor?

Based on our tests earlier this year, there seems to be some
MPICH-specific code in Condor that just won't let LAM be used.  Up until
this point, we haven't been able to see the source code to verify and/or
fix this.

We've had some conversations with the Condor folks about this over the
past 9 months, but never really got anywhere -- my last few e-mails over
the past few weeks have gone unanswered.  :-(

We're eager to see the Condor source code when it finally becomes
available; we'd be happy to help make the MPI-starting code in Condor be
more general such that LAM can be used.

Indeed, with LAM's new ability to checkpoint and restart parallel MPI
applications, this could be very useful (it doesn't [yet] support
Condor-native checkpointing, but that's mainly because we couldn't get
LAM/MPI jobs to run *at all* [because of the seemingly MPICH-specific code
in Condor], and therefore couldn't have a good platform to develop/test
with).  So there's still probably a fair amount of work to be done to make
parallel LAM/MPI jobs be able to migrate around in a Condor environment,
but we'd like to be able to get started on it.  :-)

{+} Jeff Squyres
{+} jsquyres@xxxxxxxxxxx
{+} http://www.lam-mpi.org/
