[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] synchronously starting multiple jobs in Condor



We've considered doing that. Obviously, we'd prefer to be able have Condor launch our programs directly rather than wrapping MPI around them, but we may not have a choice.

I found a presentation on one of the Condor Week webpages that mentioned future work that would allow synchronous launch of generic jobs. I also agree (and believe many others probably do, as well) that this is an essential feature.

--
Hahn Kim
MIT Lincoln Laboratory    Phone: (781) 981-0940
244 Wood Street, S2-252     Fax: (781) 981-5255
Lexington, MA 02420      E-mail: hgk@xxxxxxxxxx


Mark Silberstein wrote:
The only way ( that I am aware of ) to force jobs to start at once on
all machines is MPI universe. For our progs, which have to be started
this way, we wrap them in MPI_Init and MPI_Finalize and run as if it was
MPI application in MPI universe. This functionality of start barrier is
implemented in the dedicated schedd, running for MPI universe only, and
I don think that you have any way to hack it to do the same for non-mpi
jobs.
I remember that there were talks about generic parallel universe. Where
is it now, and any future plans on this - developers - answer our call!
I think that this 'start barrier' feature is rather necessary thing.
Mark
On Tue, 2003-11-04 at 14:57, Hahn Kim wrote:

My group has developed a Matlab library, called MatlabMPI, which implements a subset of the MPI library. Currently, it launches Matlab on multiple machines by sending commands via rsh. Currently, we are trying to integrate MatlabMPI with Condor.

Like MPI, all processes in a MatlabMPI program must start executing at the same time. Otherwise, any process that needs to communicate with an idle process will cause the MatlabMPI program to hang.

We have been trying to figure out if there is a way to force Condor to synchronously start executing a set of Matlab processes distributed across a cluster. Does any one have any ideas? Is this functionality built into Condor, or will this require a hack?


Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>



Condor Support Information: http://www.cs.wisc.edu/condor/condor-support/ To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with unsubscribe condor-users <your_email_address>