[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] repost: kill_sig = SIGINT not working

On Apr 30, 2015, at 9:35 AM, Krieger, Donald N. <kriegerd@xxxxxxxx> wrote:

Dear List:
Each of my jobs (Vanilla universe) creates a sequence of output files.
If a job is killed, either by a condor_rm that I issue or by the timer that I set with âperiodic_hold = ââ in the submitted config file,
I would like the output files that have been written to be returned.
To accomplish this I have added the following lines to my config file:
  kill_sig = 2
  job_max_vacate_time = 120
The file also includes the line:
  when_to_transfer_output = ON_EXIT_OR_EVICT
Within the shell script which I use as the wrapper for each job, I catch signal 2 (SIGINT).
When that happens, the script kills the executable which is doing the work and then exits gracefully.
But the output files are not returned.
(1)    What am I doing wrong?
(2)    Can I use the linux name for the signal, i.e. SIGINT, instead of the number?

When a job is evicted from an execute machine before it completes, the partial output files are usually transferred to a job-specific directory under the SPOOL directory on the submit machine. If you want the partial output files to be transferred to the jobâs IWD (the same location they are placed when the job completes), you can add this line to your submit file:

+SpoolOnEvict = false

Thanks and regards,
Jaime Frey
UW-Madison HTCondor Project