Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Exit_hook receiving empty job classaAd

Date: Mon, 15 Jul 2013 10:40:28 +0200
From: "Joan J. Piles" <jpiles@xxxxxxxxx>
Subject: Re: [HTCondor-users] Exit_hook receiving empty job classaAd

Hi all,

I have been debugging the issue, and I have noticed two things:

1) When the hook gets called, the execution environment has already been deleted, but it does not know about it (I checked doing a pwd and trying both ls and ls .. within the hook... result: pwd (the directory under EXECUTE) is no longer there.
2) The hook now (as of HTCondor 8.0) gets killed after 1 or 2 seconds, even if HOOKNAME_HOOK_JOB_EXIT_TIMEOUT is set to 300 (obviously, HOOKNAME matches the hook name).
3) The output directory is deleted while the script is executing (tried a sleep 1 loop and ls each second, the first second the files are there, the next they aren't).

In short, it seems as if the cleaning process ignores the hook and keeps on deleting everything and such. (and the process ended naturally, so I don't think things such as KILLING_TIMEOUT should even apply). Has this code path been changed recently? Where could I look for this in the source code? (some pointer would be most welcome).

Thanks,

Joan

El 01/07/13 12:28, Joan J. Piles escribió:

Hi all,

We have been having troubles with our JOB_EXIT_HOOKS, both in HTCondor 7.8 and in HTCondor 8.0. Some of them (and the amount is strangely increasing with time) don't get any job classAd at all. At first we thought it could be a timeout issue (we had our share of these as well), but it doesn't seem to be the case as the hook script continues its execution. Just in case, we have set both KILLING_TIMEOUT and xxxxx_HOOK_JOB_EXIT_TIMEOUT to 300 seconds, which should be more than enough for it.

The first thing our hook script tries to do is to dump the whole classad to a file (for debugging purposes), and it is creating empty files:

#!/bin/bash

TMPFILE=`mktemp /tmp/condorlog.XXXXXX`
cat > $TMPFILE

The script keeps going from there (reading the stored classad and processing it). We can see that the script tries to do its job, but it complains about not having any data to work on. That's why we have discarded the possibility of a timeout.

I found a similar report in the list from four years ago [1], but it didn't seem to get any solution. Is there anything I could do to further debug this issue?

Thanks,

Joan

[1]: https://lists.cs.wisc.edu/archive/htcondor-users/2009-July/msg00165.shtml
-- 
--------------------------------------------------------------------------
Joan Josep Piles Contreras -  Analista de sistemas
I3A - Instituto de Investigación en Ingeniería de Aragón
Tel: 876 55 51 47 (ext. 845147)
http://i3a.unizar.es -- jpiles@xxxxxxxxx
--------------------------------------------------------------------------
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

-- 
--------------------------------------------------------------------------
Joan Josep Piles Contreras -  Analista de sistemas
I3A - Instituto de Investigación en Ingeniería de Aragón
Tel: 876 55 51 47 (ext. 845147)
http://i3a.unizar.es -- jpiles@xxxxxxxxx
--------------------------------------------------------------------------

Follow-Ups:
- Re: [HTCondor-users] Exit_hook receiving empty job classaAd [BUG?]
  - From: Joan J. Piles

References:
- [HTCondor-users] Exit_hook receiving empty job classaAd
  - From: Joan J. Piles

Prev by Date: Re: [HTCondor-users] setting the shared port daemon port
Next by Date: Re: [HTCondor-users] Exit_hook receiving empty job classaAd [BUG?]
Previous by thread: [HTCondor-users] Exit_hook receiving empty job classaAd
Next by thread: Re: [HTCondor-users] Exit_hook receiving empty job classaAd [BUG?]
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Exit_hook receiving empty job classaAd