[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_q shows jobs still running which have completed



Hi Joe,

Please make sure you add quit or exit command at the end of your program. 

Thank you,
Dennis. 

On Monday, January 20, 2014, Joe Knapka <jaknapka@xxxxxxxxxxxxxxx> wrote:
Hello everyone,

I am running a large number of long-running jobs on a 56-node
Linux-based HTCondor cluster, using the "vanilla" universe (because
the programs depend on both fork() and mmap()).  I have found that
occasionally condor_q shows a job as running, when that job has
actually completed hours earlier.  The job has produced its expected
output file, and no job is running on the node it was scheduled on.
When this happens, Condor no longer schedules jobs on the compute node
it thinks the completed job is running on.  I must manually condor_rm
the job in order to get Condor to schedule further jobs on the
affected node.  I have not found references to any similar symptom in
the FAQ or via Google. Any ideas why this might be happening?

Thank you,

Joe Knapka
Bioinformatics / University of Texas / El Paso

--
"I want them to understand that there is a playground in their minds
and that that is where mathematics happens." - Paul Lockhart
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/