[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs die with signal 11



Usually signal 11 is from the problem from RAM.

On Tue, 28 Sep 2004 mcal00@xxxxxxxxxxxxx wrote:

> Hi, we've got an old linux cluster (i286 processors running RH7.2) that
> we've converted into a Condor pool and we constantly see jobs dying with
> Shadow exceptions, with the only clue in the StarterLog files being of the
> form (Condor v. 6.6.6 all round):
>
> 9/27 14:12:33 vm2: Got activate_claim request from shadow
> (<172.24.116.193:42835>)
> 9/27 14:12:33 vm2: Remote job ID is 352.0
> 9/27 14:12:33 vm2: Got universe "VANILLA" (5) from request classad
> 9/27 14:12:33 vm2: State change: claim-activation protocol successful
> 9/27 14:12:33 vm2: Changing activity: Idle -> Busy
> 9/27 14:21:22 Starter pid 21927 died on signal 11 (signal 11)
> 9/27 14:21:22 vm2: State change: starter exited
> 9/27 14:21:22 vm2: Changing activity: Busy -> Idle
>
> What's that signal 11 mean? I notice that someone spotted something similar
> under solaris last year (message 476), and Erik Paulson suggested that it
> may have been a bug. Was it ever resolved?
>
> These jobs are coming in from flocked pools across the campus, so the
> network they have to traverse is slightly unfriendlier than your average
> LAN. Could such a signal be due to a network glitch?
>
> Cheers,
> Mark
>
>
> ---------------------------------------------
> Department of Earth Sciences
> University of Cambridge
> Downing Street
> Cambridge CB2 3EQ
> Phone: ( +44 ) 1223 333400
> Fax: ( +44 ) 1223 333450
>
>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> http://lists.cs.wisc.edu/mailman/listinfo/condor-users
>