[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] What happes when a MPI job hangs?
- Date: Thu, 16 Feb 2006 16:52:44 -0600
- From: Matt Baker <bakerspage@xxxxxxx>
- Subject: [Condor-users] What happes when a MPI job hangs?
We are looking into using the latest Condor to manage MPI jobs in a
Concurrent Computing class. We have a problem killing MPI jobs using
just "mpirun", since killing one process does not kill the other
processes that were spawned when calling mpirun.
We've read that both PBS and SGE have the ability to "sense" that the
head node (process 0) has died and can clean up (kill and clear
sockets) the other processes that block waiting for communication
with process 0.
Is there a similar functionality in Condor? If I submit an unsafe MPI
job and it hangs, will condor_rm take care of the process cleanup?
University of Arkansas