[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] condor_q resets after repeated calls



Hello all. I'm running a script that detects the number of jobs running on my submitter. Basically, the script works by calling condor_q and parsing the totals line at the end of the condor_q output. 

I originally called condor_q every 3 seconds. After several days of using my script, suddenly, on the 3rd or 4th invocation of condor_q,  condor_q would no longer display any jobs, under any status whatsoever. When I would attempt to remove the ghost jobs, condor_rm would exit with the error message that the user's jobs cannot be found. 

Once condor_q breaks, it won't report normally for newly created jobs. It will often drop the jobs as a result of a single invocation. Restarting all daemons will sometimes restore condor_q. Sometimes, simply waiting a half a day works. 

Has anyone ran into this?

Best regards,
Gary