pyelftools – Python library for parsing ELF and DWARF

January 6th, 2012 at 8:59 am

I’m happy and proud to announce the release of a new open-source Python package to the world. pyelftools is a pure-Python library for parsing and analyzing ELF files and DWARF debugging information. It provides both low-level and high-level APIs for querying ELF and DWARF, and is mostly feature-complete. As a proof of capability, pyelftools ships with a fairly powerful clone of readelf.

Some basic information:

  • Website – managed as an open-source project on Github. It’s the place for documentation, opening issues and closely following the development in general.
  • Downloading – from PyPI, or the Github site.
  • Documentation – there’s a detailed user guide, and the source distribution contains several examples.
  • License – public domain.
  • Pre-requisites – Python version 2.6 or 2.7; 3.x support is in the works.

The goal of this project was two-fold. First, to better understand the ELF and DWARF formats. Second, to have a feature-complete pure-Python parser for these formats with a sufficiently high-level API to be generally useful.

Although the initial release of pyelftools (version 0.10) is formally "beta", it’s quite well validated with a comprehensive test-suite. It should also be simple to learn and tweak, due to the detailed user’s and hacker’s guides on the Github site Wiki, along with several functioning examples that ship with the library.

There are some existing tools with overlapping functionality:

  • pydevtools – an ambitious project by Emilio Monti. I initially planned to build my project on top of it (I wanted a much higher-level API than pydevtools aims to provide), but was deterred by its lack of support for the 64-bit ELF format. Adding 64-bit support appeared like a large work, so I preferred to go my own way. pyelftools is designed from scratch to support both 32-bit and 64-bit formats (as well as endianness).
  • libelf and libdwarf – very authoritative and complete implementations, but are essentially C libraries with a C API, which leaves much to be desired in terms of usability and convenience.
  • The LLVM project has an ELF reader (part of its libObject library) and recently started adding partial DWARF parsing capabilities. These parts of the project are still rather experimental and evolving, and I’m following its progress with interest.

Related posts:

  1. On parsing the C standard library headers
  2. The contents of DWARF sections
  3. Parsing C++ in Python with Clang
  4. An interesting tree serialization algorithm from DWARF
  5. Python

15 Responses to “pyelftools – Python library for parsing ELF and DWARF”

  1. Jeff EplerNo Gravatar Says:

    This is really exciting, though it’ll take some time to wrap my head around.

    I have a performance analysis tool written in python that has to resolve addresses to symbols in its reports; right now I’m shelling out to readelf and addr2line to get the relevant information. Doing it without a long-running external process for every shared library feels like it could be a real win. (less fragile, at any rate)

    So far I’ve been able to implement code that finds a function name and offset from an address, though I’m concerned that the code’s not particularly efficient (linear search of all SymbolTableSections for each address). I guess I’ll have to dive into dwarf stuff before I can do line number information. It is immensely gratifying that, like everything that’s built on Python and a well-designed package, the size of the code I had to write to get this was less than a (tall) screenful.

  2. elibenNo Gravatar Says:

    Jeff,

    Thanks for the feedback. If you’re concerned about the performance of some part of the code, please open an Issue on the Bitbucket page of pyelftools. Some things can be solved and made faster, if they are a real problem.

    addr2line would be a nice application for pyelftools. Looking at its code (in binutils 2.22), it’s quite compact with only 400 lines of C most of which deal with command-line arguments. I bet its core functionality can be done in a few dozen Python lines using pyelftools.

  3. Jeff EplerNo Gravatar Says:

    When I say that I’m concerned “the code isn’t particularly efficient”, I mean my own codeā€”not pyelftools itself.

  4. Craig McQueenNo Gravatar Says:

    This is something I wanted and started to do, but stalled on it due to several reasons. I’m interested to try your work in earnest, now that I’m on an embedded project that is using ELF/DWARF files.

    The main thing I’m interested in is to look up the information of global variables and static variables (file level and function statics). Do you have an example of how to:

    1) Given a variable name…
    2) Resolve it to a particular file or function (if it’s static–since each file/function is a separate name space).
    3) Look up its address and type information.

  5. elibenNo Gravatar Says:

    Craig,

    I don’t have such an example yet, but perhaps this could be helpful – http://eli.thegreenplace.net/2011/02/07/how-debuggers-work-part-3-debugging-information/ – even if it doesn’t exactly describe your use case, I think it’s a start. It contains code written with libdwarf to extract the info – it can be easily converted to using pyelftools.

  6. yaozong.zhuNo Gravatar Says:

    Eli,
    To meddle with large projects such as linux kernel, sometimes it is nice to retrieve source file list from the elf binary. The original project source may contain many files not required by current configuration. The neat file list is good for code browsing tools like cscope. The following patch does the thing:

    diff -r 706dbb8620bd scripts/readelf.py
    — a/scripts/readelf.py Mon Jan 30 19:20:41 2012 +0200
    +++ b/scripts/readelf.py Wed Feb 22 20:23:58 2012 +0800
    @@ -10,6 +10,7 @@
    import os, sys
    from optparse import OptionParser
    import string
    +from os.path import normpath

    # If elftools is not installed, maybe we’re running from the root or scripts
    # dir of the source distribution
    @@ -436,6 +437,8 @@
    self._dump_debug_info()
    elif dump_what == ‘decodedline’:
    self._dump_debug_line_programs()
    + elif dump_what == ‘sourcefiles’:
    + self._dump_debug_sourcefiles()
    elif dump_what == ‘frames’:
    self._dump_debug_frames()
    elif dump_what == ‘frames-interp’:
    @@ -610,6 +613,20 @@
    # Another readelf oddity…
    self._emitline()

    + def _dump_debug_sourcefiles(self):
    + “”" Dump the (decoded) file list from .debug_line
    + “”"
    + fileset = set()
    + for cu in self._dwarfinfo.iter_CUs():
    + lineprogram = self._dwarfinfo.line_program_for_CU(cu)
    + if len(lineprogram['include_directory']) > 0:
    + for fentry in lineprogram['file_entry']:
    + fileset.add(normpath(‘%s/%s’ % (
    + bytes2str(lineprogram['include_directory'][fentry.dir_index - 1]),
    + bytes2str(fentry.name))))
    + for path in fileset:
    + self._emitline(path)
    +
    def _dump_debug_frames(self):
    “”" Dump the raw frame information from .debug_frame
    “”"
    @@ -758,7 +775,7 @@
    action=’store’, dest=’debug_dump_what’, metavar=”,
    help=(
    ‘Display the contents of DWARF debug sections. can ‘ +
    - ‘one of {info,decodedline,frames,frames-interp}’))
    + ‘one of {info,decodedline,frames,frames-interp,sourcefiles}’))

    options, args = optparser.parse_args()

  7. elibenNo Gravatar Says:

    yaozong.zhu,

    Thanks. Could you please put it as a new issue on the pyelftools website? This will help me track it and not forget to look when I get the time.

  8. Emmanuel BlotNo Gravatar Says:

    Hi Eli,

    Great module, I’m using it to parse ELF files for ARM-LE binaries. I’ve added some ARM-specific decoding features.

    It is possible to use ELFFile to update the content of a section and update the underlying .elf file? Is there an example to perform this kind of task?

    Thanks,
    Manu

  9. elibenNo Gravatar Says:

    Emmanuel,

    Generally speaking, the library was designed for reading ELF data. However, its low level APIs are readily adaptable for writing it back. It would require some tinkering with the code, but should save a lot of work vs. just hand-coding it.

  10. MichaelNo Gravatar Says:

    Eli,

    I’m attempting to use your library to create a list of global variables used by an application. The list of global variables will identify which source file and at what line number they are declared.

    But I don’t understand how to use the DW_AT_decl_file attribute in the DW_TAG_variable die to identify the specific source file its related to. This is probably more of a DWARF question than a pyelftools quetion. But if pyelftools has an easy method to provide the file name directly, that’d be great.

    A bonus answer would be a method to easily “walk” the DW_AT_type attribute of the DW_TAG_variable die to identify the variable’s type.

    Cheers,
    Michael

  11. elibenNo Gravatar Says:

    Michael,

    You’re right, it is a DWARF question. pyelftools intentionally provides a rather low level interface to DWARF, because a higher level would be very difficult to implement. I try to provide more higher-level stuff in the examples, though. Another place you could look at is the source code of GDB – it’s reasonable once you get used to it. I looked at it many times while working on pyelftools.

  12. MichaelNo Gravatar Says:

    Eli,

    Looking at GDB sounds promising. Thanks for the pointer.

    Cheers,
    Michael

  13. andrew cookeNo Gravatar Says:

    did you/anyone take any further the idea of modifying the elf file?

    i have a 3rd party .so that for some frustrating reason is exposing the API for OpenSSL and so confusing my build. http://stackoverflow.com/questions/15683130/control-order-of-libraries-with-libtool i am looking around to see if there’s some tool that will easily let me corrupt (in a good way) its symbol table, so that it no longer clashes with the valid libcrypto. would this be a good place to start? know of anything better? thanks!

  14. elibenNo Gravatar Says:

    @andrew,

    I don’t have any news on this front. pyelftools is still being used for reading data, not writing it back. It includes infrastructure that would allow you to write back, but no high-level APIs for that.

  15. andrew cookeNo Gravatar Says:

    thanks. in case anyone it helps anyone else – turns out that binutils has sufficient to do what i needed (objcopy –strip-symbols and –rename-symbols).

Leave a Reply

To post code with preserved formatting, enclose it in `backticks` (even multiple lines)