[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] synchronously starting multiple jobs in Condor



The only way ( that I am aware of ) to force jobs to start at once on
all machines is MPI universe. For our progs, which have to be started
this way, we wrap them in MPI_Init and MPI_Finalize and run as if it was
MPI application in MPI universe. This functionality of start barrier is
implemented in the dedicated schedd, running for MPI universe only, and
I don think that you have any way to hack it to do the same for non-mpi
jobs.
I remember that there were talks about generic parallel universe. Where
is it now, and any future plans on this - developers - answer our call!
I think that this 'start barrier' feature is rather necessary thing.
Mark
On Tue, 2003-11-04 at 14:57, Hahn Kim wrote:
> My group has developed a Matlab library, called MatlabMPI, which 
> implements a subset of the MPI library.  Currently, it launches Matlab 
> on multiple machines by sending commands via rsh.  Currently, we are 
> trying to integrate MatlabMPI with Condor.
> 
> Like MPI, all processes in a MatlabMPI program must start executing at 
> the same time.  Otherwise, any process that needs to communicate with an 
> idle process will cause the MatlabMPI program to hang.
> 
> We have been trying to figure out if there is a way to force Condor to 
> synchronously start executing a set of Matlab processes distributed 
> across a cluster.  Does any one have any ideas?  Is this functionality 
> built into Condor, or will this require a hack?

Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>