[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Keeping Parallel Universe job alive even node0 is done


I am trying to test simple MPICH2 example code (using condor 7.0.5, MPICH2 1.0.8), calculating pi value MPI code.


I am testing this with 3 nodes, as soon as node 0 is done, condor shuts down node1 and node2 even though jobs on them did not finish.

I know it is the way condor suppose to work, but is there any work around to keep node0 alive until all the nodes are done.


Because the individual nodes are geographically distributed and also due to network latency, node0 finishes first and causes other node die and hence the Parallel universe job.