Re: [DynInst_API:] ParseAPI and PE files


Date: Thu, 17 Apr 2014 11:58:16 -0500
From: Bill Williams <bill@xxxxxxxxxxx>
Subject: Re: [DynInst_API:] ParseAPI and PE files
On 04/17/2014 11:08 AM, E.Robbins wrote:
On 17/04/2014 16:40 PM Bill Williams wrote:
Oh. One other thing--if you're trying to analyze PE files on Linux,
that's not presently going to work. It might be possible, if you have a
Linux system with the necessary Windows headers present and you know of
a replacement for the debug SDK, to coerce a Linux build of Symtab to
speak PE.

Thanks. We are indeed trying to analyse PE files in Linux. I didn't realise that this wasn't supported. When you say the debug SDK, do you mean some kind of MS VS debugger?

No, there's a Debug Information Access (DIA) SDK that's available and does a fair bit of symbol parsing for PE files that we then bake into Symtab form. Its accessibility and redistributability have been somewhat variable IIRC but if you have a non-Express version of Visual Studio, you have full access to MS SDKs last I checked.

You could probably pull the text section out via objdump or
similar and stuff it into a fake ELF file.

We'll have to think about that, but it's certainly an option in the short term I guess. We are mostly looking at malware so symbols are mostly useless, but we probably will need to know about linkage, the entry point etc etc.

Yeah, if you're concerned with malware, you'll probably want a custom CodeSource whether you're working on Linux or Windows.

I think I also have an
memory-backed CodeSource implementation floating around somewhere that
you could use as a starting point--as long as you can find the text
section and either don't care about symbols or can find them without
Windows headers, mocking up a CodeSource that speaks PE on Linux is a
simple matter of engineering.

What do you mean by a memory-backed CodeSource? We would be interested in anything that can help, though obviously we may decide it's too big a task.

The two CodeSource implementations we distribute are Symtab and Symlite based; both of these work with files on disk, as one does. For internal testing purposes, it's convenient to be able to splat a blob of code into memory and parse it, and the CodeSource implementation to do that is what I mean by "memory-backed".

It occurs to me that what you actually may want is a Windows implementation of SymLite--something that knows how to mmap in sections, read section headers and the entry point, and optionally can give you a lightweight representation of each symbol. That would then plug into the existing SymLiteCodeSource and should work seamlessly.

It's engineering we haven't done because
parsing PE on Linux is not of much use to Dyninst without a *very*
full-featured cross-format Symtab backing it, such that we could rewrite
PE files on Linux.

Fair enough... we are somewhat at odds with the goals of dyninst because we are doing static analysis and mostly use it for its control flow recovery which is very good, and to some extent for reading symbols too.

The obvious answer then is to use windows. Can the windows version of dyninst work over ELF binaries?

The Windows version doesn't speak ELF, though that's I think more practical than getting Linux Dyninst to speak PE fully. I haven't checked recently whether libelf/libdwarf build cleanly on Windows, but if they do then that's a pretty straightforward project; we'd build with libelf, libdwarf, and the dynElf/dynDwarf wrapper libraries, add the various ELF-reading source files to the build, and toss a mechanism into Symtab::openFile to check the file type and open things on the right path. (Note, straightforward does not mean low-effort; we'd almost certainly have to redesign some of the class structure so that the file type could be a runtime decision rather than a build-time one. But it's doable in principle.)

I think your best bet would be a cross-platform PE implementation of SymLite, though.

Thanks a lot,
Ed



--
--bw

Bill Williams
Paradyn Project
bill@xxxxxxxxxxx
[← Prev in Thread] Current Thread [Next in Thread→]