[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] condor_q resets after repeated calls
- Date: Fri, 9 Aug 2013 01:20:44 -0400
- From: Gary Kaganas <kaganasg@xxxxxxxxx>
- Subject: [HTCondor-users] condor_q resets after repeated calls
Hello all. I'm running a script that detects the number of jobs running on my submitter. Basically, the script works by calling condor_q and parsing the totals line at the end of the condor_q output.
I originally called condor_q every 3 seconds. After several days of using my script, suddenly, on the 3rd or 4th invocation of condor_q, condor_q would no longer display any jobs, under any status whatsoever. When I would attempt to remove the ghost jobs, condor_rm would exit with the error message that the user's jobs cannot be found.
Once condor_q breaks, it won't report normally for newly created jobs. It will often drop the jobs as a result of a single invocation. Restarting all daemons will sometimes restore condor_q. Sometimes, simply waiting a half a day works.
Has anyone ran into this?