[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] how to kill job when output dir removed ?



Hi,

I have a recurring problem here where our users submit
files through a web interface but then indadvertently
remove the directory the condor input/output files
are sitting in without killing the job first. I've
tried all sort of safeguards to prevent this but they
still seem find a way of doing it (that's users for ya !).

Condor's "try, try and try again"  strategy means that
it keeps attempting to write the output files in the hope
that the directory might reappear and deluging my inbox
with error messages in the process.

I can understand that this may have been put in to deal with flakey
NFS filesystems (although I see that Condor tries to avoid
these like the plague now) but is there anyway of getting
condor to just give up if it can't write the output files.
If not can it be set up not to bombard me with e-mail warnings.

On a related point - if I specify that a particular output file
is to be transferred back from the execute host using

transfer_output_files =

and the file isn't there (usually because the executable
has bombed) it just seems to keep on trying in vain.
Anyway to prevent this either ?

regards,

-ian.

--------------------
Dr Ian C. Smith,
The University of Liverpool,
Computer Services Department