[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] how to kill job when output dir removed ?



On 1/17/07, Ian Chesal <ICHESAL@xxxxxxxxxx> wrote:
> Now you can construct an automated policy about what should happen to
the
> held jobs by configuring SYSTEM_PERIODIC_RELEASE or
> SYSTEM_PERIODIC_REMOVE (or the user can configure a policy with the
> respective job policy expressions).

 Personally I find stupid users to be rather stable in their
continued existence. :)

Sad but true, it is amazing what throughput gains can be achieved by
getting a clever user and educating them how condor works :)

> I should also note that there was one case of file transfer errors not

> handled by 6.8's put-on-hold policy.  It is failure while writing the
> output to the submit machine (e.g. because the disk is full).  This
has
> been fixed in Condor 6.9.1, so jobs will go on hold in this case too.

> It was difficult to judge whether this was a bug fix or a change in
> behavior (i.e. suitable for 6.8 vs. 6.9).  In the end, I decided to
put
> it into 6.9.

If you're accepting lobbying for pushing this into 6.8 consider this my
request. I'd definitely like to see this as a put-on-hold situation.
Right now we're actually monitoring user's disk usage on our NAS box and
holding their jobs when they get with 1% of their hard quota.

My vote for being in 6.8 series too. After failing to store credd
after password change this is actually the most common cause of repeat
running jobs on my farm.
Making it a configurable option (and defaulting to the old behaviour)
would be an acceptable addition to 6.8 series perhaps?

- Ian

P.S. Did something change on the mailing list? I'm not seeing my own
emails to the list anymore.

I see all mine in gmail

Matt