[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] My jobs won't stay in the queue



On 7/18/05, Sean Looper <slooper@xxxxxxxxxxxxxxxxxxxx> wrote:
> That was it!  I forgot about that option.  :)  The reason why I have
> this is so we can track status data on the jobs including Completion
> Date and memory usage.  Eventually, this will evolve into parsing the
> job_queue and history files directly.

I would suggest that doing this is a bad idea...the format of those
logs is not guaranteed to stay constant (nor frankly easy to parse)

The user log (set at submission time) is however designed for this
purpose (and notes all the data you required) plus parsing logic is
supplied already.

I know it is somewhat more complex to arrange for parsing these
(especially since their is nothing forcing any users to use them or
keep them around/in a consistent place) but some form of submit
gateway/tool on each machine which passes them/the data to whatever
you wish to use to log this data may make things much easier.

It is one of the more annoying aspects from a pool admin point of view
that condor makes accurate job level (rather than machine level)
monitoring very hard. I know there are performance considerations to
deal with but it would be very nice to allow the schedds/shadows to
report back some basic but user definable state regarding the key
events on the jobs it manages (this would make tracking down the
efficiency of the pool regarding useful/wasteful evictions so much
easier)

Matt