Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Jobs die with signal 11

Date: Tue, 28 Sep 2004 10:25:24 GB
From: mcal00@xxxxxxxxxxxxx
Subject: [Condor-users] Jobs die with signal 11

Hi, we've got an old linux cluster (i286 processors running RH7.2) that
we've converted into a Condor pool and we constantly see jobs dying with
Shadow exceptions, with the only clue in the StarterLog files being of the
form (Condor v. 6.6.6 all round):

9/27 14:12:33 vm2: Got activate_claim request from shadow
(<172.24.116.193:42835>)
9/27 14:12:33 vm2: Remote job ID is 352.0
9/27 14:12:33 vm2: Got universe "VANILLA" (5) from request classad
9/27 14:12:33 vm2: State change: claim-activation protocol successful
9/27 14:12:33 vm2: Changing activity: Idle -> Busy
9/27 14:21:22 Starter pid 21927 died on signal 11 (signal 11)
9/27 14:21:22 vm2: State change: starter exited
9/27 14:21:22 vm2: Changing activity: Busy -> Idle

What's that signal 11 mean? I notice that someone spotted something similar
under solaris last year (message 476), and Erik Paulson suggested that it
may have been a bug. Was it ever resolved? 

These jobs are coming in from flocked pools across the campus, so the
network they have to traverse is slightly unfriendlier than your average
LAN. Could such a signal be due to a network glitch?

Cheers,
Mark


---------------------------------------------
Department of Earth Sciences
University of Cambridge
Downing Street
Cambridge CB2 3EQ
Phone: ( +44 ) 1223 333400
Fax: ( +44 ) 1223 333450

Follow-Ups:
- Re: [Condor-users] Jobs die with signal 11
  - From: Chen
- Re: [Condor-users] Jobs die with signal 11
  - From: Nick LeRoy

Prev by Date: [Condor-users] condor_status & condor_q unresponsive periodically
Next by Date: [Condor-users] Re: Jobs die with signal 11
Previous by thread: [Condor-users] condor_status & condor_q unresponsive periodically
Next by thread: Re: [Condor-users] Jobs die with signal 11
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[Condor-users] Jobs die with signal 11