[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] condor daemons exit



Hi Bertrand 

> yes, I looked in the master log file : there is nothing about 
> an exit of the master : 
> 
> 8/4 11:02:20 Preen pid is 17160
> 8/4 11:02:20 Child 17160 died, but not a daemon -- Ignored
> 8/5 11:02:21 Preen pid is 1055
> 8/5 11:02:23 Child 1055 died, but not a daemon -- Ignored
> 8/6 10:40:23 ******************************************************
> 8/6 10:40:23 ** condor_master (CONDOR_MASTER) STARTING UP
> 8/6 10:40:23 ** $CondorVersion: 6.6.5 May 3 2004 $
> 8/6 10:40:23 ** $CondorPlatform: PPC-DARWIN-6_8 $
> 8/6 10:40:23 ** PID = 15863
> 8/6 10:40:23 ******************************************************
> 8/6 10:40:23 Using config file: 
> /Users/condor/Programmes/condor-6.6.5/etc/condor_config

This looks worrying I would say - there should be something else after that,
in case
of successful start of condor_master, like in mine -

8/13 10:54:09 ******************************************************
8/13 10:54:09 ** Condor (CONDOR_MASTER) STARTING UP
8/13 10:54:09 ** c:\progra~1\Condor\bin\condor_master.exe
8/13 10:54:09 ** $CondorVersion: 6.6.6 Jul 26 2004 $
8/13 10:54:09 ** $CondorPlatform: INTEL-WINNT40 $
8/13 10:54:09 ** PID = 4028
8/13 10:54:09 ******************************************************
8/13 10:54:09 Using config file: c:\progra~1\Condor\condor_config
8/13 10:54:09 Using local config files:
c:\progra~1\Condor/condor_config.local
8/13 10:54:09 DaemonCore: Command Socket at <134.151.53.53:9613>
8/13 10:54:09 Started DaemonCore process
"c:\progra~1\Condor/bin/condor_schedd.exe", pid and pgroup = 4012
8/13 10:54:09 Started DaemonCore process
"c:\progra~1\Condor/bin/condor_startd.exe", pid and pgroup = 2600

It looks like you have a problem in condor_config somewhere.
Find in condor_config the line 
ALL_DEBUG =

and change it to 

ALL_DEBUG = D_FULLDEBUG

or even

ALL_DEBUG = D_ALL

if previous won't clarify the situation. (do not forget to restart
condor_master after changing config file)
And make sure to turn off this debug info flood after ther problem resolved.
Anyway, it probably makes sense to upgrade to 6.6.6 version to start with.

Andrey
 
> I manually relaunched it at 10:40:23 today but the other 
> dameons exited yesterday at 22H42. I don't understand what happens..
> 
> 
> Le 6 août 04, à 11:44, Andrey Kaliazin a écrit :
> 
> 
> 
> 	"parent process" for all daemons will be the 
> condor_master process, for ex
> 	-
> 	$ ps -ef |grep condor
> 	condor 15346 1 0 May26 ? 00:48:31 condor_master -f
> 	condor 13250 15346 0 Jul04 ? 00:33:12 condor_collector -f
> 	condor 13251 15346 0 Jul04 ? 00:05:38 condor_negotiator -f
> 	condor 13252 15346 0 Jul04 ? 00:00:27 condor_schedd -f
> 	condor 13253 15346 0 Jul04 ? 00:16:51 condor_startd -f
> 	condor 13254 15346 0 Jul04 ? 00:00:10 condor_ckpt_server
> 	
> 	so you should look in ~condor/log/MasterLog file to 
> find out what is
> 	happening with it.
> 	
> 	cheers,
> 	
> 	Andrey
> 	
> 	
> 
> 		-----Original Message-----
> 		From: condor-users-bounces@xxxxxxxxxxx 
> 		[mailto:condor-users-bounces@xxxxxxxxxxx] On 
> Behalf Of Jerome Jaglale
> 		Sent: Friday, August 06, 2004 10:01 AM
> 		To: Condor-Users Mail List
> 		Subject: [Condor-users] condor daemons exit
> 		
> 		Hello Condor users,
> 		
> 		We use Condor to manage our mac G5 's cluster, 
> and we're 
> 		happy with it. We have just a problem : after a 
> while of 
> 		inactivity, condor daemons on a computer 
> disappear. Even on 
> 		the central-manager. So when users want to 
> submit or control 
> 		their jobs, it doen't work.
> 		
> 		In the daemons' log files : 
> 		8/5 22:42:37 Our parent process (pid 16496) 
> went away; shutting down
> 		8/5 22:42:37 Got SIGTERM. Performing graceful shutdown.
> 		I don't know who is this "parent process" : 
> there was only 
> 		thre daemons : collector (16497), negotiator 
> (16498) and shedd (16499)
> 		
> 		Has someone an explication ? Is there an option 
> to set, so 
> 		the daemons never exited ?
> 		
> 		Thanks for your help,
> 		Jérôme
> 		
> 		
> 
> 
> 	_______________________________________________
> 	Condor-users mailing list
> 	Condor-users@xxxxxxxxxxx
> 	http://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 	
> 	
> 
>