On Fri, 7 Feb 2014, Josh Stone wrote:
On 02/07/2014 04:04 PM, Josh Stone wrote:
In commit 185f2def1921, RPCs first desynced UserRPCState to running, but
then merge commit 6edcf768c20d restricted it to ephemeral threads. I
believe that was a mistake, and really all threads running iRPC need
this state to make sure they don't stall in the middle.
I feel I should elaborate a little on this commit message. This stall
is exactly the "interesting problem" mentioned in a comment of
PCProcess::postIRPC_internal. Callbacks during an iRPC might set the
normal user state to stopped, but it still needs to complete the iRPC.
I've had trouble with fork-exec mutatees, where everything would hang
when I tried to loadLibrary from the exec callback. It turned out to be
exactly this "interesting problem", except the fallback loop there
wasn't getting a chance to help. However, if I did *anything* to nudge
the parent process of the fork, even silly like kill -SIGWINCH, then the
runIRPCSync would finally return its error. Then that fallback loop
would finally call continueStoppedIRPC and get things moving.
With this patch, the UserRPCState=running takes precedence while the
iRPC is running, and the normal user state can't ever stall it. This
solves my fork-exec-loadLibrary issue, and I believe it also solves the
whole "interesting problem". It probably doesn't hurt to leave that
workaround in place for a while though, just to be sure.
I'm still investigating other exec issues; for instance, I think it's a
problem that initial library callbacks are delivered before exec
callbacks, because it's important to know this is a new process. Still,
solving this hang was the most significant part to me.
Thanks for the summary, I was a little confused from reading the proposed
patch.
There was a fair amount of debate about ProcControlAPI's iRPC stopping
semantics a while back. Our solution was to allow a thread to be stopped
by the user during an iRPC. That's important for allowing the handling of
any asynchronous events (like signals) that could interrupt an iRPC, or
handling events triggered by the iRPC (like a library load). We expect
that if the user stops a thread during an iRPC, they will eventually issue
a matching continue and then we'll finish the iRPC.
It looks like this patch forces a thread with a iRPC to always be running,
even if a user asks for it to be stopped. That's going to cause other
problems. I think the proper fix is to figure out why the DyninstAPI code
that sits over ProcControlAPI isn't issuing a continue after it stops the
iRPC.
Do you have a small reproducer for the original problem? If not, could
you share some DYNINST_DEBUG_PROCCONTROL output showing the issue?
-Matt
|