[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Schedd crash due to SOAP transaction time-out



Hi,
In my condor pool the schedd crashed twice in two days while servicing the SOAP request. It seems to be transaction time out.

I am using transaction duration of 60sec and submitting 2 job say 160.0 & 160.1 within that transaction.

by
Johnson


The logs are below.

2/5 17:09:50 SOAP entered newCluster(), transaction: 1812170248
2/5 17:09:50 SOAP leaving newCluster() result=0
2/5 17:09:50 SOAP entered newJob(), transaction: 1812170248
2/5 17:09:50 mkdir(/workingcopy/spool/cluster160.proc0.subproc0) succeeded.
2/5 17:09:50 SOAP leaving newJob() result=0
2/5 17:10:27 SOAP entered newJob(), transaction: 1812179721
2/5 17:10:27 mkdir(/workingcopy/spool/cluster161.proc0.subproc0) succeeded.
2/5 17:10:27 SOAP leaving newJob() result=0
2/5 17:10:27 SOAP entered newJob(), transaction: 1812179721
2/5 17:10:27 mkdir(/workingcopy/spool/cluster161.proc1.subproc0) succeeded.
2/5 17:10:27 SOAP leaving newJob() result=0
2/5 17:10:27 SOAP entered createJobTemplate(), transaction: 0
2/5 17:10:27 SOAP leaving createJobTemplate() result=0
2/5 17:10:27 SOAP entered submit(), transaction: 1812179721
2/5 17:10:27 SOAP entered submit(), transaction: 1812179721
2/5 17:10:27 SOAP leaving submit() result=0
2/5 17:10:27 SOAP entered commitTransaction(), transaction: 1812179721
2/5 17:10:27 SOAP in release_data()
2/5 17:10:27 Timer 7162 not found
2/5 17:10:27 SOAP leaving commitTransaction() result=0

2/5 17:10:50 SOAP in transtimeout()
2/5 17:10:50 SOAP entered abortTransaction(), transaction: 1812170248
2/5 17:10:50 SOAP in release_data()
2/5 17:10:50 Timer 7154 not found
2/5 17:10:50 SOAP leaving abortTransaction() result=0
Stack dump for process 32028 at timestamp 1265370053 (18 frames)
condor_schedd(dprintf_dump_stack+0xd0)[0x81ce6b9]
condor_schedd(_Z18linux_sig_coredumpi+0x22)[0x81c28f6]
[0x9c0420]
/lib/libc.so.6[0x779d6d]
/lib/libc.so.6(__libc_malloc+0x7e)[0x77b39e]
/usr/lib/libstdc++.so.6(_Znwj+0x27)[0xbf8c67]
/usr/lib/libstdc++.so.6(_Znaj+0x1d)[0xbf8d9d]
condor_schedd(_ZN3BufC1Ei+0x12)[0x82964b0]
condor_schedd(_ZN8ReliSock6SndMsgC1Ev+0x2d)[0x82919a9]
condor_schedd(_ZN8ReliSockC1Ev+0x45)[0x8293607]
condor_schedd(_ZN8ReliSock6acceptEv+0x25)[0x82938b9]
condor_schedd(_ZN10DaemonCore9HandleReqEP6Stream+0x1d3)[0x81b715b]
condor_schedd(_ZN10DaemonCore9HandleReqEi+0x2e)[0x81bae84]
condor_schedd(_ZN10DaemonCore17CallSocketHandlerERib+0x2c9)[0x81bb47b]
condor_schedd(_ZN10DaemonCore6DriverEv+0x1834)[0x81bcdd8]
condor_schedd(main+0x17e4)[0x81c4a92]
/lib/libc.so.6(__libc_start_main+0xdc)[0x726dec]
condor_schedd(realloc+0x81)[0x8122121]

---------------------------------------------------------------------------------------------------------------------------------------

2/4 14:17:13 SOAP entered newCluster(), transaction: 1787400194
2/4 14:17:13 SOAP leaving newCluster() result=0
2/4 14:17:13 SOAP entered newJob(), transaction: 1787400194
2/4 14:17:13 mkdir(/workingcopy/spool/cluster150.proc0.subproc0) succeeded.
2/4 14:17:13 SOAP leaving newJob() result=0

2/4 14:18:25 SOAP in transtimeout()
2/4 14:18:25 SOAP entered abortTransaction(), transaction: 1787400194
2/4 14:18:25 SOAP in release_data()
2/4 14:18:25 Timer 539 not found
2/4 14:18:25 SOAP leaving abortTransaction() result=0
Stack dump for process 31332 at timestamp 1265273306 (18 frames)
condor_schedd(dprintf_dump_stack+0xd0)[0x81ce6b9]
condor_schedd(_Z18linux_sig_coredumpi+0x22)[0x81c28f6]
[0x8c9420]
/lib/libc.so.6[0x779d6d]
/lib/libc.so.6(__libc_malloc+0x7e)[0x77b39e]
/usr/lib/libstdc++.so.6(_Znwj+0x27)[0xbf8c67]
/usr/lib/libstdc++.so.6(_Znaj+0x1d)[0xbf8d9d]
condor_schedd(_ZN3BufC1Ei+0x12)[0x82964b0]
condor_schedd(_ZN8ReliSock6SndMsgC1Ev+0x2d)[0x82919a9]
condor_schedd(_ZN8ReliSockC1Ev+0x45)[0x8293607]
condor_schedd(_ZN8ReliSock6acceptEv+0x25)[0x82938b9]
condor_schedd(_ZN10DaemonCore9HandleReqEP6Stream+0x1d3)[0x81b715b]
condor_schedd(_ZN10DaemonCore9HandleReqEi+0x2e)[0x81bae84]
condor_schedd(_ZN10DaemonCore17CallSocketHandlerERib+0x2c9)[0x81bb47b]
condor_schedd(_ZN10DaemonCore6DriverEv+0x1834)[0x81bcdd8]
condor_schedd(main+0x17e4)[0x81c4a92]
/lib/libc.so.6(__libc_start_main+0xdc)[0x726dec]
condor_schedd(realloc+0x81)[0x8122121]

------------------------------------------------------------------------------------------------------------------------------------------------



Please do not print this email unless it is absolutely necessary. The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
www.wipro.com