[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] preempt and then hold?

> On Tue, 2 Aug 2005 13:52:32 -0500  Scott Koranda wrote:
> > So am I correct that for Standard universe jobs using way too
> > much memory there is no way to PREEMPT or KILL them if the
> > pool is not doing periodic checkpointing?
> no, that's not correct, either. ;)
> what my message tried to say is that for *all* universes of jobs, the
> startd is computing it's own version of the job's imagesize based on
> magic from /proc (or equivalent, depending on the platform).  that's
> the version of imagesize that's always used for evaluating PREEMPT or
> KILL in the startd, and it's the one that's recomputed every
> POLLING_INTERVAL.  periodic checkpointing does not in any way effect
> this number (though it should, but that's another story).
> however, depending on the universe, some other value of imagesize is
> sometimes stored back at the submit machine in the job queue, which is
> then used for condor_q, future matchmaking, writing userlog events,
> etc, etc.  that's (part of) why it's confusing...
> > That is what we are really after...a way to get jobs off of a
> > machine when they try to use too much memory and take the machine
> > into the weeds.
> what you originally wrote for this should be fine.
> sorry for the confusion,

Not at all...my confusion.