[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Schedd crash due to SOAP transaction time-out



Are you still seeing this regularly?

Best,


matt

On 02/05/2010 11:05 AM, Johnson koil Raj wrote:
> Hi,
>    In  my condor pool the schedd crashed twice in two days while
> servicing the SOAP request. It seems to be transaction time out.
> 
> I am using transaction duration of 60sec and submitting 2 job say 160.0
> & 160.1 within that transaction.
> 
> by
> Johnson
> 
> 
> The logs are below.
> 
> 2/5 17:09:50 SOAP entered newCluster(), transaction: 1812170248
> 2/5 17:09:50 SOAP leaving newCluster() result=0
> 2/5 17:09:50 SOAP entered newJob(), transaction: 1812170248
> 2/5 17:09:50 mkdir(/workingcopy/spool/cluster160.proc0.subproc0) succeeded.
> 2/5 17:09:50 SOAP leaving newJob() result=0
> 2/5 17:10:27 SOAP entered newJob(), transaction: 1812179721
> 2/5 17:10:27 mkdir(/workingcopy/spool/cluster161.proc0.subproc0) succeeded.
> 2/5 17:10:27 SOAP leaving newJob() result=0
> 2/5 17:10:27 SOAP entered newJob(), transaction: 1812179721
> 2/5 17:10:27 mkdir(/workingcopy/spool/cluster161.proc1.subproc0) succeeded.
> 2/5 17:10:27 SOAP leaving newJob() result=0
> 2/5 17:10:27 SOAP entered createJobTemplate(), transaction: 0
> 2/5 17:10:27 SOAP leaving createJobTemplate() result=0
> 2/5 17:10:27 SOAP entered submit(), transaction: 1812179721
> 2/5 17:10:27 SOAP entered submit(), transaction: 1812179721
> 2/5 17:10:27 SOAP leaving submit() result=0
> 2/5 17:10:27 SOAP entered commitTransaction(), transaction: 1812179721
> 2/5 17:10:27 SOAP in release_data()
> 2/5 17:10:27 Timer 7162 not found
> 2/5 17:10:27 SOAP leaving commitTransaction() result=0
> 
> 2/5 17:10:50 SOAP in transtimeout()
> 2/5 17:10:50 SOAP entered abortTransaction(), transaction: 1812170248
> 2/5 17:10:50 SOAP in release_data()
> 2/5 17:10:50 Timer 7154 not found
> 2/5 17:10:50 SOAP leaving abortTransaction() result=0
> Stack dump for process 32028 at timestamp 1265370053 (18 frames)
> condor_schedd(dprintf_dump_stack+0xd0)[0x81ce6b9]
> condor_schedd(_Z18linux_sig_coredumpi+0x22)[0x81c28f6]
> [0x9c0420]
> /lib/libc.so.6[0x779d6d]
> /lib/libc.so.6(__libc_malloc+0x7e)[0x77b39e]
> /usr/lib/libstdc++.so.6(_Znwj+0x27)[0xbf8c67]
> /usr/lib/libstdc++.so.6(_Znaj+0x1d)[0xbf8d9d]
> condor_schedd(_ZN3BufC1Ei+0x12)[0x82964b0]
> condor_schedd(_ZN8ReliSock6SndMsgC1Ev+0x2d)[0x82919a9]
> condor_schedd(_ZN8ReliSockC1Ev+0x45)[0x8293607]
> condor_schedd(_ZN8ReliSock6acceptEv+0x25)[0x82938b9]
> condor_schedd(_ZN10DaemonCore9HandleReqEP6Stream+0x1d3)[0x81b715b]
> condor_schedd(_ZN10DaemonCore9HandleReqEi+0x2e)[0x81bae84]
> condor_schedd(_ZN10DaemonCore17CallSocketHandlerERib+0x2c9)[0x81bb47b]
> condor_schedd(_ZN10DaemonCore6DriverEv+0x1834)[0x81bcdd8]
> condor_schedd(main+0x17e4)[0x81c4a92]
> /lib/libc.so.6(__libc_start_main+0xdc)[0x726dec]
> condor_schedd(realloc+0x81)[0x8122121]
> 
> ---------------------------------------------------------------------------------------------------------------------------------------
> 
> 
> 2/4 14:17:13 SOAP entered newCluster(), transaction: 1787400194
> 2/4 14:17:13 SOAP leaving newCluster() result=0
> 2/4 14:17:13 SOAP entered newJob(), transaction: 1787400194
> 2/4 14:17:13 mkdir(/workingcopy/spool/cluster150.proc0.subproc0) succeeded.
> 2/4 14:17:13 SOAP leaving newJob() result=0
> 
> 2/4 14:18:25 SOAP in transtimeout()
> 2/4 14:18:25 SOAP entered abortTransaction(), transaction: 1787400194
> 2/4 14:18:25 SOAP in release_data()
> 2/4 14:18:25 Timer 539 not found
> 2/4 14:18:25 SOAP leaving abortTransaction() result=0
> Stack dump for process 31332 at timestamp 1265273306 (18 frames)
> condor_schedd(dprintf_dump_stack+0xd0)[0x81ce6b9]
> condor_schedd(_Z18linux_sig_coredumpi+0x22)[0x81c28f6]
> [0x8c9420]
> /lib/libc.so.6[0x779d6d]
> /lib/libc.so.6(__libc_malloc+0x7e)[0x77b39e]
> /usr/lib/libstdc++.so.6(_Znwj+0x27)[0xbf8c67]
> /usr/lib/libstdc++.so.6(_Znaj+0x1d)[0xbf8d9d]
> condor_schedd(_ZN3BufC1Ei+0x12)[0x82964b0]
> condor_schedd(_ZN8ReliSock6SndMsgC1Ev+0x2d)[0x82919a9]
> condor_schedd(_ZN8ReliSockC1Ev+0x45)[0x8293607]
> condor_schedd(_ZN8ReliSock6acceptEv+0x25)[0x82938b9]
> condor_schedd(_ZN10DaemonCore9HandleReqEP6Stream+0x1d3)[0x81b715b]
> condor_schedd(_ZN10DaemonCore9HandleReqEi+0x2e)[0x81bae84]
> condor_schedd(_ZN10DaemonCore17CallSocketHandlerERib+0x2c9)[0x81bb47b]
> condor_schedd(_ZN10DaemonCore6DriverEv+0x1834)[0x81bcdd8]
> condor_schedd(main+0x17e4)[0x81c4a92]
> /lib/libc.so.6(__libc_start_main+0xdc)[0x726dec]
> condor_schedd(realloc+0x81)[0x8122121]
> 
> ------------------------------------------------------------------------------------------------------------------------------------------------
> 
> 
> 
> 
> Please do not print this email unless it is absolutely necessary.
> The information contained in this electronic message and any attachments
> to this message are intended for the exclusive use of the addressee(s)
> and may contain proprietary, confidential or privileged information. If
> you are not the intended recipient, you should not disseminate,
> distribute or copy this e-mail. Please notify the sender immediately and
> destroy all copies of this message and any attachments.
> WARNING: Computer viruses can be transmitted via email. The recipient
> should check this email and any attachments for the presence of viruses.
> The company accepts no liability for any damage caused by any virus
> transmitted by this email.
> www.wipro.com
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/