[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] condor_q shows jobs still running which have completed
- Date: Tue, 21 Jan 2014 11:18:28 -0600
- From: "John (TJ) Knoeller" <johnkn@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] condor_q shows jobs still running which have completed
When this happens, is there still a condor_starter process running on
the execute node?
does it have any child processes?
are you using a standard bioinformatics job (i.e. one that we might also
be running here at UW?)
What version of HTCondor are you using?
Is the execute node the same version as the submit node for Linux and/or
On 1/19/2014 10:05 PM, Joe Knapka wrote:
I am running a large number of long-running jobs on a 56-node
Linux-based HTCondor cluster, using the "vanilla" universe (because
the programs depend on both fork() and mmap()). I have found that
occasionally condor_q shows a job as running, when that job has
actually completed hours earlier. The job has produced its expected
output file, and no job is running on the node it was scheduled on.
When this happens, Condor no longer schedules jobs on the compute node
it thinks the completed job is running on. I must manually condor_rm
the job in order to get Condor to schedule further jobs on the
affected node. I have not found references to any similar symptom in
the FAQ or via Google. Any ideas why this might be happening?
Bioinformatics / University of Texas / El Paso