[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] mpi jobs not dying properly
- Date: Thu, 20 Aug 2009 12:54:47 -0400
- From: Peter Doherty <doherty@xxxxxxxxxxxxxxxxxxx>
- Subject: [Condor-users] mpi jobs not dying properly
I finally got an MPI job to run on a couple nodes in the cluster from
a condor job.
When when I do a condor_rm on the job, it only dies on the node that
is running the master process.
I've got these errors in my StartLog
8/20 12:52:05 Can't read ClaimId
8/20 12:52:05 condor_write(): Socket closed when trying to write 13
bytes to <10.0.10.43:37598>, fd is 6
8/20 12:52:05 Buf::write(): condor_write() failed
10.0.10.43 is the node that still has the child process running, and I
have to manually kill it.