[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[condor-users] condor doesn't perceive that job is done



Hi,

the program I submit to condor starts up all fine and terminates
successfully, but condor_q still shows the programm as running, although there is no job
at all anymore on the executing machine. This program is a binary which
starts a wrapper script which again 'exec' to some binary, as far as I could
figure out. The PID of this first binary is logged in the StarterLog file of the
executing machine and this binary itself runs as long as the second binary
runs. Both condor_starter and condor_shadow continue running after my job
terminates until the condor_starter daemon eventually dies and a shadow exception
is reported.

Does anyone know how condor knows that a job is done? What does condor look
at or what is it waiting for? It seems to me that the condor daemon is
waiting for some signal or whatever else that my program doesn't send or that never
reaches the daemon for whatever reason.

The binary actually does send an exit code which I checked with a shell
script. I also set this shell script as 'executable' in the submit file and
started the binary from this script and made the script send explicitely the exit
code when the binaries are done. But this didn't change condor's behaviour at
all. However when I start 'uname -a' instead of the binary everything works
correctly (I'm working on Solaris).

Any ideas why condor doesn't perceive that the job is done?

Thanks.
Anika

-- 
COMPUTERBILD 15/03: Premium-e-mail-Dienste im Test
--------------------------------------------------
1. GMX TopMail - Platz 1 und Testsieger!
2. GMX ProMail - Platz 2 und Preis-Qualitätssieger!
3. Arcor - 4. web.de - 5. T-Online - 6. freenet.de - 7. daybyday - 8. e-Post

Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>