Re: [DynInst_API:] [PATCH RFC] Issue a diagnostic error if x86_64 lacks LAHF/SAHF


Date: Fri, 19 Sep 2014 15:59:43 -0500
From: Bill Williams <bill@xxxxxxxxxxx>
Subject: Re: [DynInst_API:] [PATCH RFC] Issue a diagnostic error if x86_64 lacks LAHF/SAHF

Questions:

1) Is there a specific motivating system for this, or just general code
cleanup? My understanding of the evolution of the instruction set says
that LAHF should be missing from a vanishingly rare set of processors.

We have a large variety of machines in our test labs, including a few
Xeons that are old enough steppings to lack LAHF/SAHF.  Our QA noticed
dyninst crashes if a test run happens to be scheduled on one of these.
The only sign of failure is death by SIGILL, so I thought a diagnostic
message would be more useful.

Concur as far as that goes, though that's *really* old IIRC (7+ years?).

2) Despite what I thought I remembered, pushf/popf are still a 64-bit
thing, they're just pretty pessimal instructions to use. Is there a
reason not to fall back on generating them if we're already doing a
CPUID check?

Yes, pushf/popf are always available.  Since this is generated in
EmitterAMD64::emitStackAlign, you'd have to be careful about setting up
that adjusted popf, but I think it's possible.

We should, I think, move towards using stack analysis in place of the existing rounding sequences; they're both more code-efficient and more space-efficient. That in turn moves flag saves inside of alignment in (almost) all cases, and we've still got lurking functions from our old pushfq/popfq days.

Plus there's the heterogeneous point you make next...

3) As with most CPUID based stuff, we do want to be somewhat careful
about the heterogeneous systems case with binary rewriting. Any checks
that we can delay until load time are going to give more accurate/useful
results than checks at instrumentation time. This suggests to me that we
should be looking at moving CPUID-based functionality into the RTlib;
thoughts?

It's a valid point, but I don't know a solution.  In this particular
case, the LAHF/SAHF is put at the start of __libc_start_main, and I
can't tell from the core file if the RTlib has even been loaded yet.

Given that these checks alter dyninst's codegen, how would you delay
that until runtime?  I think you'd either have to branch on some
RTlib-provided flag, or push that functionality into an RTlib function,
which could be ifunc on Linux.  (doing stack adjustment or
fxsave/restore from a function sounds painful...)

I was thinking along the lines of using an RTlib ctor or an a.out-level ctor to cpuid and hotpatch, but that may be cuter than we really want/need. Also there's the downside that we're assuming that the trampguards are a sufficiently early place to shut off instrumentation from executing before the RTlib is loaded; that's not actually true when the part of instrumentation we care about is in the basetramp outside of the trampguarded portion (which this is).

But both checks, xmm and lahf, are rarely going to be missing in
practice anymore, so I'm not sure it's worth any performance penalty on
instrumentation.

Tend to concur with that. Diagnostic message at instrumentation time and matching one in the RTlib ctors?

Also, I think an assembly cpuid check is probably the most toolchain-agnostic way to do this but it is a pain to maintain and extend...

--
--bw

Bill Williams
Paradyn Project
bill@xxxxxxxxxxx
[← Prev in Thread] Current Thread [Next in Thread→]