Re: [DynInst_API:] [PATCH RFC] Issue a diagnostic error if x86_64 lacks LAHF/SAHF


Date: Fri, 19 Sep 2014 17:10:53 -0700
From: Josh Stone <jistone@xxxxxxxxxx>
Subject: Re: [DynInst_API:] [PATCH RFC] Issue a diagnostic error if x86_64 lacks LAHF/SAHF
On 09/19/2014 01:59 PM, Bill Williams wrote:
> 
>>> Questions:
>>>
>>> 1) Is there a specific motivating system for this, or just general code
>>> cleanup? My understanding of the evolution of the instruction set says
>>> that LAHF should be missing from a vanishingly rare set of processors.
>>
>> We have a large variety of machines in our test labs, including a few
>> Xeons that are old enough steppings to lack LAHF/SAHF.  Our QA noticed
>> dyninst crashes if a test run happens to be scheduled on one of these.
>> The only sign of failure is death by SIGILL, so I thought a diagnostic
>> message would be more useful.
>>
> Concur as far as that goes, though that's *really* old IIRC (7+ years?).

Wikipedia says they were added to AMD revision D in March 2005, Pentium
4 G1 stepping in December 2005.  So yeah, prior CPUs are almost relics.

But then, the conditional FXSAVE would have to go back to Pentium II,
partying like its 1999!  (2001 for AMD)

>>> 2) Despite what I thought I remembered, pushf/popf are still a 64-bit
>>> thing, they're just pretty pessimal instructions to use. Is there a
>>> reason not to fall back on generating them if we're already doing a
>>> CPUID check?
>>
>> Yes, pushf/popf are always available.  Since this is generated in
>> EmitterAMD64::emitStackAlign, you'd have to be careful about setting up
>> that adjusted popf, but I think it's possible.
>>
> We should, I think, move towards using stack analysis in place of the 
> existing rounding sequences; they're both more code-efficient and more 
> space-efficient. That in turn moves flag saves inside of alignment in 
> (almost) all cases, and we've still got lurking functions from our old 
> pushfq/popfq days.

Sounds like a good idea regardless of old hardware.

>> Plus there's the heterogeneous point you make next...
>>
>>> 3) As with most CPUID based stuff, we do want to be somewhat careful
>>> about the heterogeneous systems case with binary rewriting. Any checks
>>> that we can delay until load time are going to give more accurate/useful
>>> results than checks at instrumentation time. This suggests to me that we
>>> should be looking at moving CPUID-based functionality into the RTlib;
>>> thoughts?
>>
>> It's a valid point, but I don't know a solution.  In this particular
>> case, the LAHF/SAHF is put at the start of __libc_start_main, and I
>> can't tell from the core file if the RTlib has even been loaded yet.
>>
>> Given that these checks alter dyninst's codegen, how would you delay
>> that until runtime?  I think you'd either have to branch on some
>> RTlib-provided flag, or push that functionality into an RTlib function,
>> which could be ifunc on Linux.  (doing stack adjustment or
>> fxsave/restore from a function sounds painful...)
>>
> I was thinking along the lines of using an RTlib ctor or an a.out-level 
> ctor to cpuid and hotpatch, but that may be cuter than we really 
> want/need. Also there's the downside that we're assuming that the 
> trampguards are a sufficiently early place to shut off instrumentation 
> from executing before the RTlib is loaded; that's not actually true when 
> the part of instrumentation we care about is in the basetramp outside of 
> the trampguarded portion (which this is).

Another thing about hotpatching is that I'd like to eventually get RT
away from having any memory that's writable *and* executable.  I've
wanted this for a while, just lacked enough round tuits so far.  There
are ways you can still allow it, like having separate write/exec
mappings for the same memory, but it'd be nice to avoid even that.

>> But both checks, xmm and lahf, are rarely going to be missing in
>> practice anymore, so I'm not sure it's worth any performance penalty on
>> instrumentation.
>>
> Tend to concur with that. Diagnostic message at instrumentation time and 
> matching one in the RTlib ctors?

I'm fine with that, but are you sure it will run before *any* LAHF might
be reached?  At least in the dynamic case, I don't even reach any
DYNINSTinit message from DYNINST_DEBUG_RTLIB=1.  Maybe static is written
differently, but your trampguard note doesn't encourage me.

> Also, I think an assembly cpuid check is probably the most 
> toolchain-agnostic way to do this but it is a pain to maintain and extend...

MSDN documents the __cpuid() intrinsic at least back to VS2005.

For others - can you check for cpuid.h on a few that you care about?  I
see it with both GCC and Clang on Fedora, and I'd rather avoid NIH.  If
we must, inline asm would be a little easier to make it generic.

[← Prev in Thread] Current Thread [Next in Thread→]