Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Job causes condor_schedd to crash

Date: Fri, 6 Dec 2013 18:43:07 -0500
From: Ben Cotton <ben.cotton@xxxxxxxxxxxxxxxxxx>
Subject: [HTCondor-users] Job causes condor_schedd to crash

One of our clients has seen an issue with HTCondor 7.6.3 where a job
will cause the schedd to crash. It's a single job within a
many-thousand job cluster, and there doesn't seem to be anything
particular about the job that causes this, and it's not reproducible
on demand.

What seems to be happening is that the job is starting about 1800
times within a 30 second period and the job_history.log file ends up
with approximately 30 million lines containing that specific
job.process ID. The schedd dies repeatedly until the job is removed
from the history file.

I don't see any mentions of fixes for this in subsequent release
notes, so I was just wondering if anyone else has seen it.

-- 
Ben Cotton
main: 888.292.5320

Cycle Computing
Leader in Utility HPC Software

http://www.cyclecomputing.com
twitter: @cyclecomputing

Follow-Ups:
- Re: [HTCondor-users] Job causes condor_schedd to crash
  - From: Greg Thain

Prev by Date: Re: [HTCondor-users] Excel issue when using Matlab with Condor.
Next by Date: [HTCondor-users] CFP: The 23nd International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC) 2014
Previous by thread: [HTCondor-users] ICSOB 2014: Second CFP and workshop proposals
Next by thread: Re: [HTCondor-users] Job causes condor_schedd to crash
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[HTCondor-users] Job causes condor_schedd to crash