[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor daemons exit



8/5 22:42:37 Our parent process (pid 16496) went away; shutting down
8/5 22:42:37 Got SIGTERM. Performing graceful shutdown.
I don't know who is this "parent process" : there was only thre daemons : collector (16497), negotiator (16498) and shedd (16499)

The parent process is the condor_master. From the log file you sent, it looks like it exited silently. Did you possible kill that process with "kill -9"? Did anything happen to your computer at 8/5 22:42 that could have killed the master?

Yes, the parent process is the condor_master, and it exited silently. I didn't kill him and nothing happened to the computer that could kill him. It did it again today : I was watching the condor_master process with un script :

condor 680 53.7 21.2 1839364 222108 ?? Rs Tue10AM 96:17.05 /Users/condor/Programmes/condor-6.6.5/sbin/condor_master
condor 1619 0.0 0.0 18644 100 std R+ 12:53AM 0:00.00 grep condor_master

condor 1627 0.0 0.0 18644 100 std R+ 12:53AM 0:00.00 grep condor_master
condor 680 0.0 0.0 0 0 ?? Es Tue10AM 0:00.00 (condor_master)

condor 1638 0.0 0.0 18644 100 std R+ 12:53AM 0:00.00 grep condor_master


It exited at 12:53AM this night. There was absolutely nothing about that in the MasterLog : the last message was early yesterday :

8/11 11:40:08 Preen pid is 21399
8/11 11:40:09 Child 21399 died, but not a daemon -- Ignored


And always the same message in the NegociatorLog :

8/12 00:54:13 Our parent process (pid 680) went away; shutting down
8/12 00:54:13 Got SIGTERM. Performing graceful shutdown.
8/12 00:54:13 **** condor_negotiator (condor_NEGOTIATOR) EXITING WITH STATUS 0


in the CollectorLog :

8/12 00:54:13 Our parent process (pid 680) went away; shutting down
8/12 00:54:13 Got SIGTERM. Performing graceful shutdown.
8/12 00:54:13 **** condor_collector (condor_COLLECTOR) EXITING WITH STATUS 0


in the SchedLog :

8/12 00:49:54 Sent ad to central manager for damien@xxxxxxxxxxxx
8/12 00:54:13 Activity on stashed negotiator socket
8/12 00:54:13 Socket activated, but could not read command
8/12 00:54:13 (Negotiator probably invalidated cached socket)
8/12 00:54:14 Our parent process (pid 680) went away; shutting down
8/12 00:54:14 Got SIGTERM. Performing graceful shutdown.
8/12 00:54:14 Cleaning job queue...
8/12 00:54:14 All shadows are gone, exiting.
8/12 00:54:14 **** condor_schedd (condor_SCHEDD) EXITING WITH STATUS 0



Has someone an explication ? Is there an option to set, so the daemons never exited ?
If some external process or person didn't kill the condor_master, then the it shouldn't exit.
Is there a core file for the condor_master? If so, that would indicate that it crashed. You can usually find the core files for the daemons in the same directory as the log files.

No there wasn't other files in log directory.
If you have other ideas..

I'm going to execute condor_master command periodically with cron. Perhaps it will prevent its silent exit.

Thanks,
Jérôme