pyelftools – Python library for parsing ELF and DWARF
January 6th, 2012 at 8:59 amI’m happy and proud to announce the release of a new open-source Python package to the world. pyelftools is a pure-Python library for parsing and analyzing ELF files and DWARF debugging information. It provides both low-level and high-level APIs for querying ELF and DWARF, and is mostly feature-complete. As a proof of capability, pyelftools ships with a fairly powerful clone of readelf.
Some basic information:
- Website – managed as an open-source project on Bitbucket. It’s the place for documentation, opening issues and closely following the development in general.
- Downloading – from PyPI, or the Bitbucket site.
- Documentation – there’s a detailed user guide, and the source distribution contains several examples.
- License – public domain.
- Pre-requisites – Python version 2.6 or 2.7; 3.x support is in the works.
The goal of this project was two-fold. First, to better understand the ELF and DWARF formats. Second, to have a feature-complete pure-Python parser for these formats with a sufficiently high-level API to be generally useful.
Although the initial release of pyelftools (version 0.10) is formally "beta", it’s quite well validated with a comprehensive test-suite. It should also be simple to learn and tweak, due to the detailed user’s and hacker’s guides on the Bitbucket site Wiki, along with several functioning examples that ship with the library.
There are some existing tools with overlapping functionality:
- pydevtools – an ambitious project by Emilio Monti. I initially planned to build my project on top of it (I wanted a much higher-level API than pydevtools aims to provide), but was deterred by its lack of support for the 64-bit ELF format. Adding 64-bit support appeared like a large work, so I preferred to go my own way. pyelftools is designed from scratch to support both 32-bit and 64-bit formats (as well as endianness).
- libelf and libdwarf – very authoritative and complete implementations, but are essentially C libraries with a C API, which leaves much to be desired in terms of usability and convenience.
- The LLVM project has an ELF reader (part of its libObject library) and recently started adding partial DWARF parsing capabilities. These parts of the project are still rather experimental and evolving, and I’m following its progress with interest.
Related posts:

January 7th, 2012 at 00:34
This is really exciting, though it’ll take some time to wrap my head around.
I have a performance analysis tool written in python that has to resolve addresses to symbols in its reports; right now I’m shelling out to readelf and addr2line to get the relevant information. Doing it without a long-running external process for every shared library feels like it could be a real win. (less fragile, at any rate)
So far I’ve been able to implement code that finds a function name and offset from an address, though I’m concerned that the code’s not particularly efficient (linear search of all SymbolTableSections for each address). I guess I’ll have to dive into dwarf stuff before I can do line number information. It is immensely gratifying that, like everything that’s built on Python and a well-designed package, the size of the code I had to write to get this was less than a (tall) screenful.
January 7th, 2012 at 05:05
Jeff,
Thanks for the feedback. If you’re concerned about the performance of some part of the code, please open an Issue on the Bitbucket page of pyelftools. Some things can be solved and made faster, if they are a real problem.
addr2linewould be a nice application for pyelftools. Looking at its code (in binutils 2.22), it’s quite compact with only 400 lines of C most of which deal with command-line arguments. I bet its core functionality can be done in a few dozen Python lines using pyelftools.January 7th, 2012 at 05:46
When I say that I’m concerned “the code isn’t particularly efficient”, I mean my own codeānot pyelftools itself.
January 11th, 2012 at 02:10
This is something I wanted and started to do, but stalled on it due to several reasons. I’m interested to try your work in earnest, now that I’m on an embedded project that is using ELF/DWARF files.
The main thing I’m interested in is to look up the information of global variables and static variables (file level and function statics). Do you have an example of how to:
1) Given a variable name…
2) Resolve it to a particular file or function (if it’s static–since each file/function is a separate name space).
3) Look up its address and type information.
January 11th, 2012 at 08:24
Craig,
I don’t have such an example yet, but perhaps this could be helpful – http://eli.thegreenplace.net/2011/02/07/how-debuggers-work-part-3-debugging-information/ – even if it doesn’t exactly describe your use case, I think it’s a start. It contains code written with
libdwarfto extract the info – it can be easily converted to usingpyelftools.February 22nd, 2012 at 15:00
Eli,
To meddle with large projects such as linux kernel, sometimes it is nice to retrieve source file list from the elf binary. The original project source may contain many files not required by current configuration. The neat file list is good for code browsing tools like cscope. The following patch does the thing:
diff -r 706dbb8620bd scripts/readelf.py
— a/scripts/readelf.py Mon Jan 30 19:20:41 2012 +0200
+++ b/scripts/readelf.py Wed Feb 22 20:23:58 2012 +0800
@@ -10,6 +10,7 @@
import os, sys
from optparse import OptionParser
import string
+from os.path import normpath
# If elftools is not installed, maybe we’re running from the root or scripts
# dir of the source distribution
@@ -436,6 +437,8 @@
self._dump_debug_info()
elif dump_what == ‘decodedline’:
self._dump_debug_line_programs()
+ elif dump_what == ‘sourcefiles’:
+ self._dump_debug_sourcefiles()
elif dump_what == ‘frames’:
self._dump_debug_frames()
elif dump_what == ‘frames-interp’:
@@ -610,6 +613,20 @@
# Another readelf oddity…
self._emitline()
+ def _dump_debug_sourcefiles(self):
+ “”" Dump the (decoded) file list from .debug_line
+ “”"
+ fileset = set()
+ for cu in self._dwarfinfo.iter_CUs():
+ lineprogram = self._dwarfinfo.line_program_for_CU(cu)
+ if len(lineprogram['include_directory']) > 0:
+ for fentry in lineprogram['file_entry']:
+ fileset.add(normpath(‘%s/%s’ % (
+ bytes2str(lineprogram['include_directory'][fentry.dir_index - 1]),
+ bytes2str(fentry.name))))
+ for path in fileset:
+ self._emitline(path)
+
def _dump_debug_frames(self):
“”" Dump the raw frame information from .debug_frame
“”"
@@ -758,7 +775,7 @@
action=’store’, dest=’debug_dump_what’, metavar=”,
help=(
‘Display the contents of DWARF debug sections. can ‘ +
- ‘one of {info,decodedline,frames,frames-interp}’))
+ ‘one of {info,decodedline,frames,frames-interp,sourcefiles}’))
options, args = optparser.parse_args()
February 23rd, 2012 at 13:55
yaozong.zhu,
Thanks. Could you please put it as a new issue on the pyelftools website? This will help me track it and not forget to look when I get the time.
April 11th, 2012 at 12:43
Hi Eli,
Great module, I’m using it to parse ELF files for ARM-LE binaries. I’ve added some ARM-specific decoding features.
It is possible to use ELFFile to update the content of a section and update the underlying
.elffile? Is there an example to perform this kind of task?Thanks,
Manu
April 11th, 2012 at 20:34
Emmanuel,
Generally speaking, the library was designed for reading ELF data. However, its low level APIs are readily adaptable for writing it back. It would require some tinkering with the code, but should save a lot of work vs. just hand-coding it.
June 18th, 2012 at 20:42
Eli,
I’m attempting to use your library to create a list of global variables used by an application. The list of global variables will identify which source file and at what line number they are declared.
But I don’t understand how to use the DW_AT_decl_file attribute in the DW_TAG_variable die to identify the specific source file its related to. This is probably more of a DWARF question than a pyelftools quetion. But if pyelftools has an easy method to provide the file name directly, that’d be great.
A bonus answer would be a method to easily “walk” the DW_AT_type attribute of the DW_TAG_variable die to identify the variable’s type.
Cheers,
Michael
June 19th, 2012 at 06:09
Michael,
You’re right, it is a DWARF question. pyelftools intentionally provides a rather low level interface to DWARF, because a higher level would be very difficult to implement. I try to provide more higher-level stuff in the examples, though. Another place you could look at is the source code of GDB – it’s reasonable once you get used to it. I looked at it many times while working on pyelftools.
June 19th, 2012 at 20:20
Eli,
Looking at GDB sounds promising. Thanks for the pointer.
Cheers,
Michael
March 28th, 2013 at 13:58
did you/anyone take any further the idea of modifying the elf file?
i have a 3rd party .so that for some frustrating reason is exposing the API for OpenSSL and so confusing my build. http://stackoverflow.com/questions/15683130/control-order-of-libraries-with-libtool i am looking around to see if there’s some tool that will easily let me corrupt (in a good way) its symbol table, so that it no longer clashes with the valid libcrypto. would this be a good place to start? know of anything better? thanks!
March 29th, 2013 at 05:15
@andrew,
I don’t have any news on this front. pyelftools is still being used for reading data, not writing it back. It includes infrastructure that would allow you to write back, but no high-level APIs for that.
March 30th, 2013 at 04:52
thanks. in case anyone it helps anyone else – turns out that binutils has sufficient to do what i needed (objcopy –strip-symbols and –rename-symbols).