[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Can't find transition out of state "START" for event "CHILD_EXIT"



On 01/11/2011 03:15 AM, Carsten Aulbert wrote:
Hi all,

anyone with an idea what this message wants to tell me?

10.12.0.20 us the local IP of the execute node where this was found in
StarterLog.slot3

01/11 08:33:47 ********** STARTER starting up ***********
01/11 08:33:47 ** $CondorVersion: 7.4.4 Oct 13 2010 BuildID: 279383 $
01/11 08:33:47 ** $CondorPlatform: X86_64-LINUX_DEBIAN50 $
01/11 08:33:47 ******************************************
01/11 08:33:47 Submitting machine is "atlas2.atlas.local"
01/11 08:33:47 EventHandler {
01/11 08:33:47  func = 0x4fb7ca
01/11 08:33:47  mask = SIGALRM SIGHUP SIGINT SIGUSR1 SIGUSR2 SIGCHLD SIGTSTP
01/11 08:33:47 }
01/11 08:33:47 condor_write(): Socket closed when trying to write 25 bytes to
<10.12.0.20:0>, fd is 17
01/11 08:33:47 Buf::write(): condor_write() failed
Stack dump for process 4844 at timestamp 1294731227 (13 frames)
condor_starter(dprintf_dump_stack+0xb7)[0x4f6eb5]
condor_starter[0x4f7122]
/lib/libpthread.so.0[0x7f860d925a80]
/lib/libc.so.6(gsignal+0x35)[0x7f860ce4fed5]
/lib/libc.so.6(abort+0x183)[0x7f860ce513f3]
/lib/libc.so.6(__assert_fail+0xe9)[0x7f860ce48dc9]
condor_starter(REMOTE_CONDOR_register_fs_domain+0x195)[0x4d76d4]
condor_starter(_Z21init_environment_infov+0x26)[0x4c0ee7]
condor_starter(_Z4initv+0xe)[0x4c2176]
condor_starter(_ZN12StateMachine7executeEv+0x20e)[0x4fbad6]
condor_starter(main+0x11d)[0x4c231d]
/lib/libc.so.6(__libc_start_main+0xe6)[0x7f860ce3c1a6]
condor_starter(_ZNSt8ios_base4InitD1Ev+0x59)[0x4c0c09]
01/11 08:33:47 ERROR "Can't find transition out of state "START" for event
"CHILD_EXIT"" at line 326 in file state_machine_driver.cpp

Cheers

Carsten

I can tell you that it is unique to the standard universe starter. The starter.std uses a general state machine structure to define its operation. That error comes from when an event (CHILD_EXIT) is not defined as a possibly transition for a state (START).

If I had to guess, which is the case, I'd say the message doesn't matter much because the starter probably lost its connection to the shadow during startup. You weren't going to make much progress anyway. You should check the submit side to see if I'm right.

It's probably also a bug, where a transition for a failed shadow should be added to the state machine, but there isn't much the starter could do to to compensate except maybe exit quietly.

Best,


matt