[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] FW: [Condor] Problem condor_startd died (11)



We are getting a ton of these messages from our Pool after updating from
7.4 to 7.8.4.
Does it mean we are obliged to run the new daemon that clears out
partitioned slots?
Or is it showing up a bug, which seems likely as startd should not seg
fault?
-Ian
--
Ian Cottam
IT Services - supporting research
Faculty of EPS
The University of Manchester





On 06/10/2012 04:18, "Owner of Condor Daemons"
<condor@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

>This is an automated email from the Condor system
>on machine "xxx".  Do not reply.
>
>"/usr/sbin/condor_startd" on "e-c07atg105057.it.manchester.ac.uk" died
>due to signal 11 (Segmentation fault).
>Condor will automatically restart this process in 10 seconds.
>
>*** Last 20 line(s) of file /var/log/condor/StartLog:
>10/05/12 20:51:13 slot1_3: State change: claim-activation protocol
>successful
>10/05/12 20:51:13 slot1_3: Changing activity: Idle -> Busy
>10/05/12 20:51:13 slot1_1: match_info called
>10/05/12 20:51:13 slot1_4: Got activate_claim request from shadow
>(130.88.203.22)
>10/05/12 20:51:13 slot1_4: Remote job ID is 329729.2744
>10/05/12 20:51:13 slot1_4: Got universe "VANILLA" (5) from request classad
>10/05/12 20:51:13 slot1_4: State change: claim-activation protocol
>successful
>10/05/12 20:51:13 slot1_4: Changing activity: Idle -> Busy
>10/06/12 04:18:21 slot1_1: Called deactivate_claim_forcibly()
>10/06/12 04:18:21 slot1_1: Changing state and activity: Claimed/Busy ->
>Preempting/Vacating
>10/06/12 04:18:21 Starter pid 2555 exited with status 0
>10/06/12 04:18:21 slot1_1: State change: starter exited
>10/06/12 04:18:21 slot1_1: State change: No preempting claim, returning
>to owner
>10/06/12 04:18:21 slot1_1: Changing state and activity:
>Preempting/Vacating -> Owner/Idle
>10/06/12 04:18:21 slot1_1: State change: IS_OWNER is false
>10/06/12 04:18:21 slot1_1: Changing state: Owner -> Unclaimed
>10/06/12 04:18:21 slot1_1: Changing state: Unclaimed -> Delete
>10/06/12 04:18:21 slot1_1: Resource no longer needed, deleting
>10/06/12 04:18:27 Job no longer matches partitionable slot after
>MODIFY_REQUEST_EXPR_ edits, retrying w/o edits
>10/06/12 04:18:27 slot1: Partitionable slot can't be split to allocate a
>dynamic slot large enough for the claim
>*** End of file StartLog
>
>
>
>-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>Questions about this message or Condor in general?
>Email address of the local Condor administrator:
>ian.cottam@xxxxxxxxxxxxxxxx
>The Official Condor Homepage is http://www.cs.wisc.edu/condor