[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] How to prevent jobs to submit new jobs?

On 03/15/2013 06:53 AM, Chris Filo Gorgolewski wrote:
I run into a peculiar problem. Users are submitting jobs that submit
more jobs. This is problematic because if gets preempted and restarted
all the jobs it had submitted will be submitted again causing general
chaos. So how can I  prevent jobs to submit new jobs?


If the user's job runs with the user's identity/credentials there are no reasonable options.

If the user's job does not run w/ the user's identity/credentials you can lock down the schedd to not allow submissions from the identity/credentials that the jobs are using (possibly nobody or a slot user).

The root cause being users writing jobs that "misbehave" by your definition of misbehave (not cleaning up after themselves or not checking if they've already partially run). This is often where you have to step into policy and social engineering.

You could explore disabling preemption for the users who have jobs that submit more jobs. Condor provides a tool called DAGMan that is basically a well written job that submits more jobs, maybe your users should be using it. Alternatively, you can educate the users about your definition of misbehaving and give them guidance on how to properly behave, then provide incentives by giving misbehaving users an overall lower priority in your pool (let the fair share algorithm have a memory).