[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Running a HOOK_JOB_EXIT

Hi Matthew, New day, fresh start, tried '+HookKeyword="MYHOOK" ' in the job file; still no message from the exit or any indication of the hook running in either the StarterLog or StartLog.

When reading through the doc, it kind of assumes some environment but never explicitly states what it is. So let me be explicit and perhaps it might trigger other ideas.

1. I have 3 nodes involved: a submit host which "spools" the job to the condor server (running master, collector, negotiator, procd, schedd, and n shadows), and an execution node (running master, startd, and starter).

2. /etc/condor/condor_config on the execution node contains:
  # Job hooks
  #MYHOOK_HOOK_JOB_EXIT = /usr/local/bin/munin-node-condor-job-exit
  MYHOOK_HOOK_JOB_EXIT = /usr/local/bin/job_exit.sh

3. /usr/local/bin on the execution node contains:
  -rwxr-xr-x 1 root root  47 Jun 16 16:33 /usr/local/bin/job_exit.sh
-rwxr-xr-x 1 root root 102 Jun 16 13:18 /usr/local/bin/munin-node-condor-job-exit

4. The job file contains:

I read a RH bug report that had bash scripts for the hooks, all with ".sh" suffixes, instead of a perl scripts so I thought I would try that and make sure that was not the problem. But it doesn't make any difference. So I'm still probing and hoping for other ideas. Colin.

Matthew Farrellee wrote:
Closer reading this morning...

You need MYHOOK_HOOK_JOB_EXIT in your config and to add +HookKeyword="MYHOOK" in your submit file.



On 06/16/2011 06:57 PM, Colin Leavett-Brown wrote:
Hi Matthes, I set ALL_DEBUG = FULL_DEBUG and transferred back from the
execution node both StartLog and StarterLog; neither one have any
indication the MYHOOK_HOOK_JOB_EXIT ran. Colin.

Matthew Farrellee wrote:
On 06/16/2011 04:42 PM, Colin Leavett-Brown wrote:
Running Condor 7.6.1 under Scientific Linux 5.5

I am trying to run HOOK_JOB_EXIT at the conclusion of my job, but it
appears that the hook is never run. I have created a simple job whose
output file (x.out) accurately details my problem:

[crlb@elephant condor]$ cat x.out
1. The job script:
echo 1. The job script:
cat condor_exec.exe

echo 2. Hook config:
grep -i HOOK /etc/condor/condor_config

echo 3. Permissions on the hook:
ls -l /usr/local/bin/munin-node-condor-job-exit

echo 4. The hook:
cat /usr/local/bin/munin-node-condor-job-exit

echo 5. Test run of the hook:

echo 6. Job exit should produce a second line of output from the hook:

2. Hook config:
# Job hooks
HOOK_JOB_EXIT = /usr/local/bin/munin-node-condor-job-exit

3. Permissions on the hook:
-rwxr-xr-x 1 root root 102 Jun 16 13:18

4. The hook:
open(DD, ">>x.out");
print DD "Executing munin-node-condor-job-exit\n";

5. Test run of the hook:
Executing munin-node-condor-job-exit

6. Job exit should produce a second line of output from the hook:
[crlb@elephant condor]$

But it doesn't! Any suggestions greatly appreciated.
You should look in the StartLog/StarterLog to see if there is any
indication that your hook was run. I would expect it is, unless
there's a permissions issue (maybe a world writable dir in the
script's path).

The second invocation of the hook may not be writing to the file
you're expecting.



Colin Leavett-Brown
Department of Physics & Astronomy
University of Victoria