[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor SOAP bug: stopping server when there are pending transactions hangs daemons

On May 1, 2006, at 5:42 PM, David E. Konerding wrote:


I am noticing a very inconvenient bug with Condor SOAP:

If a transaction is begun, and has not yet expired, stopping the condor
master causes all the daemons to go to a zombie
state and hang around.

This is probably the same problem as the condor_q issue below. All Condor daemons are single threaded, so if there is a SOAP transaction active no one can talk to the Schedd. I'm guessing that the Master just gives up trying to tell it's children to shutdown at some point and exits. If the children are shutdown serially then a "hanging" Schedd at the beginning of the child list would account for this.

Also (this is said to be fixed, but doesn't seem to be): if there are
transactions begun, which have not yet expired, running
condor_q will hang (until the transaction expires, I believe)

I hope we've never said that is fixed, because the daemons are still single threaded. Todd's fix for transactions in 6.7.19 might fix this as well though. FYI, this is also an artifact of using the "command port" for both SOAP and the Condor protocols (e.g. what condor_q talks...).