On 9/4/2017 11:06 AM, Carles Acosta wrote:
It seems that the issue was related withÂJOB_IS_FINISHED_INTERVAL. It was set at 10 seconds, but the jobs stayed for much longer as I've commented in my previous email. Removing JOB_IS_FINISHED_INTERVAL from the Schedd config, all seems to work correctly again. There are no more "actOnJobs: didn't do any work, aborting" messages in the Schedd for the last 7 hours.
I don't know if I misunderstand JOB_IS_FINISHED_INTERVAL macro. We're running HTCondor 8.6.5 and there were no issues related with JOB_IS_FINISHED_INTERVAL with previous versions, such as the 8.5.8, as far as we know.
Thank you very much.
Thanks for the follow-up post above.
You stated you had problems when your config had
What was JOB_IS_FINISHED_COUNT set to be?
If you change JOB_IS_FINISHED_INTERVAL to be 10, and don't also set JOB_IS_FINISHED_COUNT, the result is the schedd will only allow one job to leave the queue every 10 seconds!!Â I am guessing this is situation you encountered. Basically these two config knobs should always be changed together - see the below cut-n-paste from the manual.Â Note the default for JOB_IS_FINISHED_INTERVAL is 0, which is the same as not defining it.... i.e. the default configuration works.Â I am curious where/how you ended up with a setting of JOB_IS_FINISHED=10 without a corresponding JOB_IS_FINISHED_COUNT setting. I checked with OSG and the configuration they ship for the HTCondor-CEÂ does not change either of these knobs.
Hope this helps
>From the Manual:
Â Â An integer value representing the number of jobs that the condor_schedd will let permanently leave the job queue each time that it examines the jobs that are ready to do so. The default value is 1.
Â Â The condor_schedd maintains a list of jobs that are ready to permanently leave the job queue, for example, when they have completed or been removed. This integer-valued macro specifies a delay in seconds between instances of taking jobs permanently out of the queue. The default value is 0, which tells the condor_schedd to not impose any delay.