[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Vacating job and attaching meta data for the next host to take over the vacated job



On Tue, 22 Feb 2005 11:13:04 -0500, Dave Lajoie <davelaj@xxxxxxxxxxxx> wrote:
> Hello Everyone!
> 
> I am using condor to drive an hybride windows\linux 3d renderfarm.
> I have developed a perl wrapper to trap signals sent when ever vacating the
> job such that proper cleanup can be done.
> 
> Q1) is it possible for that wrapper to "attach" some custom data back to the
> vacated job, such that the next machine taking over the vacated job, would
> be able to re-use that meta-data/custom data to speed up processing ( the
> data is not large, but long to compute)?

the condor vacate mechanism when using vanilla universe jobs is:

1) An OS/job dependent signal is sent.

2) The condor system start evaluating KILL every so often,

either:

3a) KILL evaluates to true before the running job exits
3b) The running job exits

a) Is taken to mean that the checkpoint request was ignored/failed to
complete in time.
The job will be restarted with its original submission state/the last
good checkpoint (see b)

b) Is taken to mean that the checkpoint succeeded (a bit nasty if your
job actually just exited because it had finished).
The jobs running directory is transported back to a location in the
spool directory on the submission machine, on the next restart the job
is transferred with that setup (it is the jobs responsibility then to
spot that it has been started in this way)

This is somewhat hacky for several reasons and means that really on
vanilla non checkpointable jobs you should make KILL just be true
every time or totally disable preemption (which means machine based
RANK must always be the same)...

So in answer to your question the job itself should respond to the
preemption notification (note it isn't a request!) by either. writing
any saved state it requires as well as some means of flagging its
success such as a flag file to the working directory then exiting.
Or two - do nothing until you get killed to indicate you didn't finish
due to the preempt rather than actually finishing - just make sure
that KILL will at some point evaluate to true :) !

> Also, when rendering 4K resolution 3d plates, it can take a lot of time to
> render it as you might know. In order to avoid losing precious rendering
> time, it would be silly to vacate a job when the job is 80% to 90%
> completed. my the wrapper can determine the render progression which now
> leads to those extra questions
> 
> Q2) Is there a way to make that render progression ( percentage ) available
> such that condor_status / condor_q can display it for all the hosts
> rendering a job?

tricky - your job is running on a machine which can technically update
the state of the job class ad on the submitting machine but the condor
binaries to do this may not be accessible to the running job...you
could hack round this and use condor_qedit on a user set classad on
the job but be careful.
However see below on it's utility...

> Q3) can this "render progression" information can be used by the job
> scheduler to determine if vacating is applicable? basically I looking for
> ways to "vacate at next frame". for example setting priority to +20 whenever
> render progression has reach 80% or more, so it can't be vacated, once the
> frame is done, the priority return to original.

condor may well use this to determine the job to preempt first by
updating PREEMPT_RANK to take it into account.

You could also SET PREEMPT to take it into account but this may lead
to runaway jobs never getting kicked (perhaps a good thing since then
the user has to kill it properly)

The reconfigurability would be somewhat risky though...I wouldn't
recommend it and would just suggest you stop preemption if the
throughput losses become too big (you usage may well trend to a steady
state where this is not required much so don't totally discard the
idea of just seeing what happens.

Note that the default preemption rank is to preempt the longest
running jobs - while fine for standard as a default this is sorely
lacking for a vanilla only farm - I recommend inverting this logic to
start with (though if you have a great deal of user activity you may
well get thrashing so you could tweak the expression to rank jobs in
the first few mins higher than those which have run for half an hour
or so but not ones which have run for several hours.

you get the idea

Matt