[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Change JobUniverse from vanilla to local?

Hi Todd,

On Wed, 2017-09-20 at 12:56:41 -0500, Todd Tannenbaum wrote:
> On 9/19/2017 2:25 AM, Steffen Grunewald wrote:
> >Hi John,
> >
> >I understand - design decisions. So rewriting the DAG is the only way out of this misery...
> >
> >Thanks,
> >  Steffen
> >
> Some other brain storm ideas -

Everything is welcome!

> 1. How about running a condor_startd on your dagman/submit machine, and the

That's what I actually did.

> START expression would be something like "only run jobs submitted by DAGMan
> that have been idle for over X amount of time" ?

Such a criterion would easily match thousands (literally) of jobs, given our
dag structure. Adding a requestmemory lower limit may help. I decided to
check for this specific user in the START expression temporarily.

> 2. Perhaps you could use the condor_jobrouter to transform a vanilla job
> into a local universe job?

Never heard of that before. When was it added? It doesn't seem to be in
my "handbook" yet...

> I personally like option #1 better, since then the job remains vanilla
> universe, and the management of jobs is better.

Agreed. We're planning to add memory to a few machines to resolve this issue.
This will need adjustment of START expressions, of course - or better preemption
than we have now.

> Hope the above helps

Me too :) Thanks, Steffen