[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs License Management



> Stuart Anderson wrote:
> > In the context of the new Concurrency Limit will it be possible for
a
> > running job to drop a resource constraint when it is done with it,
or
> > is it implicitly assumed that all jobs require their specified
> > resources for their entire lifetime?
> >
> > The motivation for this is managing I/O resources where a typical
work
> > flow is to launch a large number of jobs that each read in a large
> > amount data from a shared filesystem (or set of filesystems), and
then
> > crunch on the data for a long time before outputing a relatively
small
> > amount of results. It would be interesting to be able to hand out
> > tokens for filer access but then be able to return them after the
I/O
> > intensive phase of each individual job is done.
> >
> > Thanks.
> >
> > --
> > Stuart Anderson  anderson@xxxxxxxxxxxxxxxx
> > http://www.ligo.caltech.edu/~anderson
>
> Right now the limits exist for the lifetime of the job. It is
> conceivable that jobs able to modify their ad, via chirp, would be
able
> to update the limits they use. However, this is currently not part of
> the implementation.

We deal with this now in our own pre-Condor resource scheduler and truly
the best answer we have come up with this to the problem is: divide up
the jobs. It is more work on the part of the job developer but
ultimately it lets you keep the simplest resource request and
partitioning scheme. Predictability wins out time and time again for us
over complexity.

We'll often see developers writing flows that use limited, expensive
Tool A, then B then C and submitting a job that requires all three that
then blocks for an eternity, starved, trying to get all three, while
jobs that only need 1 of the three fly by it. The answer is always:
write a job that submits a job. Your entry job uses Tool A, finishes,
submits a job that uses Tool B, etc. DAGs make this even easier.

Stuart, in your case a DAG would work very well: the first point on the
DAG is your file-transfer intensive portion of the job, and it needs a
resource, the second point that follows is the number crunching portion
and it doesn't need any resources.

- Ian

Confidentiality Notice.
This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,  or copying  of this message, or any attachments, is strictly prohibited.  If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.