Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Why does the Starter deamon keeps dying with SEGV?

Date: Fri, 4 Aug 2006 16:43:00 -0500
From: Jaime Frey <jfrey@xxxxxxxxxxx>
Subject: Re: [Condor-users] Why does the Starter deamon keeps dying with SEGV?

On Jul 27, 2006, at 3:53 AM, Mark Calleja wrote:

Jaime Frey wrote:
Mark Calleja wrote:
We're running a linux pool with Condor 6.6.11 and we persistentlysee a
number of vanilla jobs whose Starter keeps dying with (from the

StartLog):

7/17 08:07:03 Starter pid 16900 died on signal 11 (signal 11)
7/17 08:07:03 vm1: State change: starter exited
The StarterLog shows nothing, even with full debug turned on. Thejobsthen keep resubmitting themselves to die a similar death. As faras Ican tell this is the daemon itself dying, not the applicationthat itsrunning (which runs fine from the console). We're using thedynamicallylinked binaries under Debian "etch". Can anyone shed any lightwhy this
should be happening, and more importantly how we can fix it?

Ta,
Mark
What does the starter log say around the time of the segfault?
Are there any core files in the condor log directory?


Hi Jaime,

I'm afraid there's nothing in the StarterLog. Here's the relevant
snippet for the job mentioned above:

7/17 08:04:25 VM1_USER set, so running job as condor_user
7/17 08:04:30 File transfer completed successfully.
7/17 08:04:31 Starting a VANILLA universe job with ID: 251.23
7/17 08:04:31 IWD: /home/condor/execute/dir_16900
7/17 08:04:31 Output file: /home/condor/execute/dir_16900/out.23
7/17 08:04:31 Error file: /home/condor/execute/dir_16900/err.23
7/17 08:04:31 Renice expr "19" evaluated to 19
7/17 08:04:31 About to exec
/home/condor/execute/dir_16900/condor_exec.exe noDark 100000 -23 8 1
7/17 08:04:31 Create_Process succeeded, pid=16902
7/17 08:07:05 ******************************************************
7/17 08:07:05 ** condor_starter (CONDOR_STARTER) STARTING UP

7/17 08:07:05 ** /usr/Condor/RH9/condor-6.6.11-dynamic/sbin/condor_starter

7/17 08:07:05 ** $CondorVersion: 6.6.11 Mar 23 2006 $
7/17 08:07:05 ** $CondorPlatform: I386-LINUX_RH9 $
7/17 08:07:05 ** PID = 22324
7/17 08:07:05 ******************************************************
7/17 08:07:05 Using config file: /home/condor/condor_config

7/17 08:07:05 Using local config files: /home/condor/condor_config.local


Note how there's nothing about the job's death, and a new one just
starts immediately afterwards. Also, there's no core file being left
behind anywhere. Sorry!


Odd. Is this happening regularly or just the one time?

+--------------------------------+-----------------------------------+
|           Jaime Frey           | I used to be a heavy gambler.     |
|       jfrey@xxxxxxxxxxx        | But now I just make mental bets.  |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind.        |
+--------------------------------+-----------------------------------+

Prev by Date: [Condor-users] problems in using remote negotiator
Next by Date: Re: [Condor-users] checkpoint on java
Previous by thread: [Condor-users] problems in using remote negotiator
Next by thread: Re: [Condor-users] checkpoint on java
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] Why does the Starter deamon keeps dying with SEGV?