I’m working on getting submissions to a grid universe using SGE as the resource. In our SGE scheduler, we have a per user job limit which is set to 4000. I see that there is a GRIDMANAGER_MAX_SUBMITTED_JOBS_PER_RESOURCE setting, but since this is only in the scope of a schedd, it is possible for one user to submit a bunch of jobs on each of the two condor schedulers that we have and then exceed that 4000 limit.
When this happens, the jobs fail to submit and are put into a Held state. But I can’t get them to periodically release.
I have defined the PeriodicRelease in the submit file to be something simple:
periodic_release = ((JobStatus==5) && (CurentTime - EnteredCurrentStatus) > 30)
And I have configured this setting on the schedd:
PERIODIC_EXPR_INTERVAL = 30
But the jobs are never released. If I manually run condor_release on these jobs they will work fine, but I thought that’s what the PeriodicRelease setting was for? According to the documentation, periodic_release is available for all of the universes. Can anyone help me understand what’s going on here?
We’re running Condor 8.0.2 on RHEL 6.