[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] preempt and then hold?



On Tue, 2 Aug 2005 13:52:32 -0500  Scott Koranda wrote:

> So am I correct that for Standard universe jobs using way too
> much memory there is no way to PREEMPT or KILL them if the
> pool is not doing periodic checkpointing?

no, that's not correct, either. ;)

what my message tried to say is that for *all* universes of jobs, the
startd is computing it's own version of the job's imagesize based on
magic from /proc (or equivalent, depending on the platform).  that's
the version of imagesize that's always used for evaluating PREEMPT or
KILL in the startd, and it's the one that's recomputed every
POLLING_INTERVAL.  periodic checkpointing does not in any way effect
this number (though it should, but that's another story).

however, depending on the universe, some other value of imagesize is
sometimes stored back at the submit machine in the job queue, which is
then used for condor_q, future matchmaking, writing userlog events,
etc, etc.  that's (part of) why it's confusing...

> That is what we are really after...a way to get jobs off of a
> machine when they try to use too much memory and take the machine
> into the weeds.

what you originally wrote for this should be fine.


sorry for the confusion,
-derek