[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Running a HOOK_JOB_EXIT



From this morning...

--
$ condor_version
$CondorVersion: 7.6.1 May 31 2011 BuildID: RH-7.6.1-0.9.el6 $
$CondorPlatform: X86_64-Fedora_13 $

$ condor_config_val -dump | grep HOOK
MUNIN_HOOK_JOB_EXIT = /usr/local/bin/munin-node-condor-job-exit

$ echo 'cmd = munin.sh\ntransfer_executable=true\noutput=x.out\nshould_transfer_files=ALWAYS\nwhen_to_transfer_output=ON_EXIT\n+HookKeyword="MUNIN"\nqueue' | condor_submit

$ cat x.out
1. The job script:
#!/bin/bash

#echo
#echo 0. pwd
#pwd
#ls -al

echo 1. The job script:
cat condor_exec.exe

echo
echo 2. Hook config:
grep -i HOOK /etc/condor/condor_config

echo
echo 3. Permissions on the hook:
ls -l /usr/local/bin/munin-node-condor-job-exit

echo
echo 4. The hook:
cat /usr/local/bin/munin-node-condor-job-exit

echo
echo 5. Test run of the hook:
/usr/local/bin/munin-node-condor-job-exit

echo
echo 6. Job exit should produce a second line of output from the hook:

2. Hook config:

3. Permissions on the hook:
-rwxr-xr-x. 1 root root 98 Jun 17 11:04 /usr/local/bin/munin-node-condor-job-exit

4. The hook:
#!/usr/bin/perl
open(DD, ">>x.out");
print DD "Executing munin-node-condor-job-exit\n";
close(DD);
5. Test run of the hook:
Executing munin-node-condor-job-exit

6. Job exit should produce a second line of output from the hook:
Executing munin-node-condor-job-exit
--

Submit again without the HookKeyword and you won't see the message in 6.

Best,


matt

On 06/17/2011 01:56 PM, Colin Leavett-Brown wrote:
Hi Matthew, New day, fresh start, tried '+HookKeyword="MYHOOK" ' in the
job file; still no message from the exit or any indication of the hook
running in either the StarterLog or StartLog.

When reading through the doc, it kind of assumes some environment but
never explicitly states what it is. So let me be explicit and perhaps it
might trigger other ideas.

1. I have 3 nodes involved: a submit host which "spools" the job to the
condor server (running master, collector, negotiator, procd, schedd, and
n shadows), and an execution node (running master, startd, and starter).

2. /etc/condor/condor_config on the execution node contains:
# Job hooks
STARTER_JOB_HOOK_KEYWORD = MYHOOK
#MYHOOK_HOOK_JOB_EXIT = /usr/local/bin/munin-node-condor-job-exit
MYHOOK_HOOK_JOB_EXIT = /usr/local/bin/job_exit.sh

3. /usr/local/bin on the execution node contains:
-rwxr-xr-x 1 root root 47 Jun 16 16:33 /usr/local/bin/job_exit.sh
-rwxr-xr-x 1 root root 102 Jun 16 13:18
/usr/local/bin/munin-node-condor-job-exit

4. The job file contains:
+HookKeyword="MYHOOK"

I read a RH bug report that had bash scripts for the hooks, all with
".sh" suffixes, instead of a perl scripts so I thought I would try that
and make sure that was not the problem. But it doesn't make any
difference. So I'm still probing and hoping for other ideas. Colin.


Matthew Farrellee wrote:
Closer reading this morning...

You need MYHOOK_HOOK_JOB_EXIT in your config and to add
+HookKeyword="MYHOOK" in your submit file.

Best,


matt

On 06/16/2011 06:57 PM, Colin Leavett-Brown wrote:
Hi Matthes, I set ALL_DEBUG = FULL_DEBUG and transferred back from the
execution node both StartLog and StarterLog; neither one have any
indication the MYHOOK_HOOK_JOB_EXIT ran. Colin.

Matthew Farrellee wrote:
On 06/16/2011 04:42 PM, Colin Leavett-Brown wrote:
Running Condor 7.6.1 under Scientific Linux 5.5

I am trying to run HOOK_JOB_EXIT at the conclusion of my job, but it
appears that the hook is never run. I have created a simple job whose
output file (x.out) accurately details my problem:

[crlb@elephant condor]$ cat x.out
1. The job script:
#!/bin/bash
echo 1. The job script:
cat condor_exec.exe

echo
echo 2. Hook config:
grep -i HOOK /etc/condor/condor_config

echo
echo 3. Permissions on the hook:
ls -l /usr/local/bin/munin-node-condor-job-exit

echo
echo 4. The hook:
cat /usr/local/bin/munin-node-condor-job-exit

echo
echo 5. Test run of the hook:
/usr/local/bin/munin-node-condor-job-exit

echo
echo 6. Job exit should produce a second line of output from the hook:

2. Hook config:
# Job hooks
HOOK_JOB_EXIT = /usr/local/bin/munin-node-condor-job-exit

3. Permissions on the hook:
-rwxr-xr-x 1 root root 102 Jun 16 13:18
/usr/local/bin/munin-node-condor-job-exit

4. The hook:
#!/usr/bin/perl
open(DD, ">>x.out");
print DD "Executing munin-node-condor-job-exit\n";
close(DD);

5. Test run of the hook:
Executing munin-node-condor-job-exit

6. Job exit should produce a second line of output from the hook:
[crlb@elephant condor]$

But it doesn't! Any suggestions greatly appreciated.
You should look in the StartLog/StarterLog to see if there is any
indication that your hook was run. I would expect it is, unless
there's a permissions issue (maybe a world writable dir in the
script's path).

The second invocation of the hook may not be writing to the file
you're expecting.

Best,


matt