<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Eli Bendersky's website &#187; Programming</title>
	<atom:link href="http://eli.thegreenplace.net/category/programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://eli.thegreenplace.net</link>
	<description>Eli Bendersky's personal website</description>
	<lastBuildDate>Fri, 30 Jul 2010 12:30:52 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Contributing to Python</title>
		<link>http://eli.thegreenplace.net/2010/07/23/contributing-to-python/</link>
		<comments>http://eli.thegreenplace.net/2010/07/23/contributing-to-python/#comments</comments>
		<pubDate>Fri, 23 Jul 2010 15:07:43 +0000</pubDate>
		<dc:creator>eliben</dc:creator>
				<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://eli.thegreenplace.net/?p=2253</guid>
		<description><![CDATA[I&#8217;ve been involved in open-source projects almost since the first days of my &#34;serious&#34; programming (back in 1998), but these were always projects I started myself. I&#8217;ve long been thinking about joining one of the big and established open-source projects, both to make a contribution and to improve my own skills by working with some [...]


Related posts:<ol><li><a href='http://eli.thegreenplace.net/2009/03/13/python-documentation-annoyance/' rel='bookmark' title='Permanent Link: Python documentation annoyance'>Python documentation annoyance</a> <small>Edit: I&#8217;ve actually started working on fixing this annoyance in...</small></li><li><a href='http://eli.thegreenplace.net/2010/05/22/migrating-my-personal-projects-to-mercurial/' rel='bookmark' title='Permanent Link: Migrating my personal projects to Mercurial'>Migrating my personal projects to Mercurial</a> <small> Introduction My first acquaintance with version control was soon...</small></li><li><a href='http://eli.thegreenplace.net/2008/06/27/creating-python-extension-modules-in-c/' rel='bookmark' title='Permanent Link: Creating Python extension modules in C'>Creating Python extension modules in C</a> <small>I&#8217;ve successfully created a C extension for Python, basically following...</small></li></ol>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been involved in open-source projects almost since the first days of my &quot;serious&quot; programming (back in 1998), but these were always projects I started myself. I&#8217;ve long been thinking about joining one of the big and established open-source projects, both to make a contribution and to improve my own skills by working with some great people on interesting things.</p>
<p>Once I started tinkering with Python <a class="reference external" href="http://eli.thegreenplace.net/2008/05/14/python/">around two years ago</a>, it became the major candidate for my contribution &#8211; both because working on to Python can really make a difference for a huge amount of users, and because Python&#8217;s inner development circles include some of the brightest programmers I ever ran into. Joining this clique, even as a humble minor contributor, is very appealing.</p>
<p>So, a few weeks ago, inspired a <a class="reference external" href="http://jessenoller.com/2010/04/22/why-arent-you-contributing-to-python">couple</a> of <a class="reference external" href="http://tech.blog.aknin.name/2010/04/08/contributing-to-python/">articles</a>, I&#8217;ve finally made the plunge.</p>
<div align="center" class="align-center"><img alt="http://eli.thegreenplace.net/wp-content/uploads/2010/07/smilingpython.gif" class="align-center" src="http://eli.thegreenplace.net/wp-content/uploads/2010/07/smilingpython.gif" /></div>
<p>For now, my contributions are very minor: I&#8217;ve been involved in a few <a class="reference external" href="http://bugs.python.org/">issues</a>, and made several patches. A few were even committed into Python &#8211; one <a class="reference external" href="http://bugs.python.org/issue9132">documentation patch</a> and  <a class="reference external" href="http://bugs.python.org/issue9282">two</a> <a class="reference external" href="http://bugs.python.org/issue9323">patches</a> fixing bugs in the <tt class="docutils literal"><span class="pre">trace.py</span></tt> module in Python 3.x</p>
<p>I&#8217;m also &quot;in progress&quot; on several other issues, dealing with the <tt class="docutils literal"><span class="pre">trace.py</span></tt> module (improving its documentation, adding unit tests and debugging some issues with 3.x), documentation fixes for some standard library modules and a bug fix for <tt class="docutils literal"><span class="pre">difflib</span></tt>. Once you make the first step, finding more things to work on is quite easy. Python&#8217;s code and documentation are of relatively high quality, but like in any major software project, there&#8217;s place for improvement almost everywhere you look, even if the improvements are very minor (making the documentation more consistently formatted or clearer).</p>
<p>A few words on how I work on Python.</p>
<p>Although Python is well-supported on Windows and can be built on it without much trouble, Linux is the most convenient platform to use for development IMO. I&#8217;m using a <a class="reference external" href="http://eli.thegreenplace.net/2010/03/27/running-several-oses-in-one-using-virtualbox/">Ubuntu VM running on VirtualBox</a> on top of my Windows XP machine.</p>
<p>Python&#8217;s code is kept in a <a class="reference external" href="http://svn.python.org/view/">Subversion repository</a>, to which you can get a read-only access when you&#8217;re not a core committer. It means you can&#8217;t really interact with the repository, and if you want to save your temporary work, you&#8217;re on your own.</p>
<p>Luckily, Python is in the process of moving to Mercurial, and already has a <a class="reference external" href="http://code.python.org/hg">functional mirror</a> set up. Mercurial is a much better SCM tool for this purpose, because it allows you to work locally with your repository, only pulling changes from the official one when necessary.</p>
<p>Here&#8217;s my workflow with the Mercurial mirror of Python:</p>
<div align="center" class="align-center"><img alt="http://eli.thegreenplace.net/wp-content/uploads/2010/07/pythonrepos.png" class="align-center" src="http://eli.thegreenplace.net/wp-content/uploads/2010/07/pythonrepos.png" /></div>
<p>My local Mercurial repo is where I do all my hacking, occasionally backing-up to my personal clone at <tt class="docutils literal"><span class="pre">code.google.com</span></tt>. This lets me explore various ideas, create temporary fixes, all of this with full version control. From time to time, I&#8217;m pulling a fresh snapshot from Python&#8217;s official Mercurial mirror to get back on track, but I will always be able to get back to my own changes, because everything is safely stored in the history of my repo.</p>
<p>However, I still keep the SVN checkouts around, because:</p>
<ol class="arabic simple">
<li>I want to make sure my changes work on a clean check-out from Python&#8217;s official repository, which is still SVN.</li>
<li>I create patches against the SVN repo (with <tt class="docutils literal"><span class="pre">svn</span> <span class="pre">diff</span></tt>), because Mercurial creates slightly different diffs. Since committers actually commit into the SVN repo, this makes their lives easier.</li>
</ol>
<p>It&#8217;s easy to keep several versions of Python around. For example, I have the repositories for the 3.x development branch (both Mercurial for hacking and SVN for patches), plus the 2.7 and 2.6 maintenance branches. To get a new version/branch all one needs is:</p>
<ol class="arabic simple">
<li>Check it out from SVN or clone from Mercurial</li>
<li><tt class="docutils literal"><span class="pre">configure</span></tt> and then <tt class="docutils literal"><span class="pre">make</span></tt></li>
<li>Create a link somewhere on <tt class="docutils literal"><span class="pre">PATH</span></tt> to the relevant executable (for example I have in <tt class="docutils literal"><span class="pre">~/bin</span></tt> a link named <tt class="docutils literal"><span class="pre">py27</span></tt> for the 2.7 version, <tt class="docutils literal"><span class="pre">py3d</span></tt> for the debug build of the latest 3.x, and so on). The Python interpreter, once executed, knows where to find its own libraries, making it very simple to work with several versions of Python simultaneously.</li>
</ol>
<p>To conclude, now you know what&#8217;s been keeping me busy in the past month or so. Contributing to Python is something I&#8217;ve long wanted doing, and I&#8217;m happy that I finally started. It turned out to be much less difficult than I originally expected, and I now firmly believe that any competent developer with the desire to help and some free time on his hands can become a contributor.</p>
<p><strong>P.S.</strong> I had the privilege of receiving useful guidance from Terry Reedy, and I&#8217;d like to thank him for that. We still cooperate on several issues, and I hope we&#8217;ll continue working together. &quot;Pair-contribution&quot; seems like an interesting model the Python community may want to look into. I also want to thank Alexander Belopolsky for getting my fixes for <tt class="docutils literal"><span class="pre">trace.py</span></tt> quickly committed.</p>
<img src="http://eli.thegreenplace.net/?ak_action=api_record_view&id=2253&type=feed" alt="" />

<p>Related posts:<ol><li><a href='http://eli.thegreenplace.net/2009/03/13/python-documentation-annoyance/' rel='bookmark' title='Permanent Link: Python documentation annoyance'>Python documentation annoyance</a> <small>Edit: I&#8217;ve actually started working on fixing this annoyance in...</small></li><li><a href='http://eli.thegreenplace.net/2010/05/22/migrating-my-personal-projects-to-mercurial/' rel='bookmark' title='Permanent Link: Migrating my personal projects to Mercurial'>Migrating my personal projects to Mercurial</a> <small> Introduction My first acquaintance with version control was soon...</small></li><li><a href='http://eli.thegreenplace.net/2008/06/27/creating-python-extension-modules-in-c/' rel='bookmark' title='Permanent Link: Creating Python extension modules in C'>Creating Python extension modules in C</a> <small>I&#8217;ve successfully created a C extension for Python, basically following...</small></li></ol></p>]]></content:encoded>
			<wfw:commentRss>http://eli.thegreenplace.net/2010/07/23/contributing-to-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Python internals: adding a new statement to Python</title>
		<link>http://eli.thegreenplace.net/2010/06/30/python-internals-adding-a-new-statement-to-python/</link>
		<comments>http://eli.thegreenplace.net/2010/06/30/python-internals-adding-a-new-statement-to-python/#comments</comments>
		<pubDate>Wed, 30 Jun 2010 17:18:35 +0000</pubDate>
		<dc:creator>eliben</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://eli.thegreenplace.net/?p=2231</guid>
		<description><![CDATA[This article is an attempt to better understand how the front-end of Python works. Just reading documentation and source code may be a bit boring, so I&#8217;m taking a hands-on approach here: I&#8217;m going to add an until statement to Python.
All the coding for this article was done against the cutting-edge Py3k branch in the [...]


Related posts:<ol><li><a href='http://eli.thegreenplace.net/2009/11/28/python-internals-working-with-python-asts/' rel='bookmark' title='Permanent Link: Python internals: Working with Python ASTs'>Python internals: Working with Python ASTs</a> <small> Starting with Python 2.5, the Python compiler (the part...</small></li><li><a href='http://eli.thegreenplace.net/2009/02/16/abstract-vs-concrete-syntax-trees/' rel='bookmark' title='Permanent Link: Abstract vs. Concrete Syntax Trees'>Abstract vs. Concrete Syntax Trees</a> <small>CSTs &#8211; Concrete Syntax Trees (a.k.a. Parse Trees) and ASTs...</small></li><li><a href='http://eli.thegreenplace.net/2008/07/11/asts-for-analyzing-c/' rel='bookmark' title='Permanent Link: ASTs for analyzing C'>ASTs for analyzing C</a> <small>As I wrote here, I&#8217;ve commonly found myself in the...</small></li></ol>]]></description>
			<content:encoded><![CDATA[<p>This article is an attempt to better understand how the front-end of Python works. Just reading documentation and source code may be a bit boring, so I&#8217;m taking a hands-on approach here: I&#8217;m going to add an <tt class="docutils literal"><span class="pre">until</span></tt> statement to Python.</p>
<p>All the coding for this article was done against the cutting-edge Py3k branch in the <a class="reference external" href="http://code.python.org/hg/branches/py3k/">Python Mercurial repository mirror</a>.</p>
<div class="section" id="the-until-statement">
<h3>The <tt class="docutils literal"><span class="pre">until</span></tt> statement</h3>
<p>Some languages, like Ruby, have an <tt class="docutils literal"><span class="pre">until</span></tt> statement, which is the complement to <tt class="docutils literal"><span class="pre">while</span></tt> (<tt class="docutils literal"><span class="pre">until</span> <span class="pre">num</span> <span class="pre">==</span> <span class="pre">0</span></tt> is equivalent to <tt class="docutils literal"><span class="pre">while</span> <span class="pre">num</span> <span class="pre">!=</span> <span class="pre">0</span></tt>). In Ruby, I can write:</p>
<div class="highlight">
<pre>num = <span style="color: #007f7f">3</span>
<span style="color: #00007f; font-weight: bold">until</span> num == <span style="color: #007f7f">0</span> <span style="color: #00007f; font-weight: bold">do</span>
  <span style="color: #00007f">puts</span> num
  num -= <span style="color: #007f7f">1</span>
<span style="color: #00007f; font-weight: bold">end</span>
</pre>
</div>
<p>And it will print:</p>
<div class="highlight">
<pre>3
2
1
</pre>
</div>
<p>So, I want to add a similar capability to Python. That is, being able to write:</p>
<div class="highlight">
<pre>num = <span style="color: #007f7f">3</span>
until num == <span style="color: #007f7f">0</span>:
  <span style="color: #00007f; font-weight: bold">print</span>(num)
  num -= <span style="color: #007f7f">1</span>
</pre>
</div>
</div>
<div class="section" id="a-language-advocacy-digression">
<h3>A language-advocacy digression</h3>
<p>This article doesn&#8217;t attempt to suggest the addition of an <tt class="docutils literal"><span class="pre">until</span></tt> statement to Python. Although I think such a statement would make some code clearer, and this article displays how easy it is to add, I completely respect Python&#8217;s philosophy of minimalism. All I&#8217;m trying to do here, really, is gain some insight into the inner workings of Python.</p>
</div>
<div class="section" id="modifying-the-grammar">
<h3>Modifying the grammar</h3>
<p>Python uses a custom parser generator named <tt class="docutils literal"><span class="pre">pgen</span></tt>. This is a LL(1) parser that converts Python source code into a parse tree. The input to the parser generator is the file <tt class="docutils literal"><span class="pre">Grammar/Grammar</span></tt> <a class="footnote-reference" href="#id4" id="id1">[1]</a>. This is a simple text file that specifies the grammar of Python.</p>
<p>Two modifications have to be made to the grammar file. The first is to add a definition for the <tt class="docutils literal"><span class="pre">until</span></tt> statement. I found where the <tt class="docutils literal"><span class="pre">while</span></tt> statement was defined (<tt class="docutils literal"><span class="pre">while_stmt</span></tt>), and added <tt class="docutils literal"><span class="pre">until_stmt</span></tt> below <a class="footnote-reference" href="#id5" id="id2">[2]</a>:</p>
<div class="highlight">
<pre>compound_stmt: if_stmt | while_stmt | until_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated
if_stmt: &#39;if&#39; test &#39;:&#39; suite (&#39;elif&#39; test &#39;:&#39; suite)* [&#39;else&#39; &#39;:&#39; suite]
while_stmt: &#39;while&#39; test &#39;:&#39; suite [&#39;else&#39; &#39;:&#39; suite]
until_stmt: &#39;until&#39; test &#39;:&#39; suite
</pre>
</div>
<p>Note that I&#8217;ve decided to exclude the <tt class="docutils literal"><span class="pre">else</span></tt> clause from my definition of <tt class="docutils literal"><span class="pre">until</span></tt>, just to make it a little bit different (and because frankly I dislike the <tt class="docutils literal"><span class="pre">else</span></tt> clause of loops and don&#8217;t think it fits well with the Zen of Python).</p>
<p>The second change is to modify the rule for <tt class="docutils literal"><span class="pre">compound_stmt</span></tt> to include <tt class="docutils literal"><span class="pre">until_stmt</span></tt>, as you can see in the snippet above. It&#8217;s right after <tt class="docutils literal"><span class="pre">while_stmt</span></tt>, again.</p>
<p>When you run <tt class="docutils literal"><span class="pre">make</span></tt> after modifying <tt class="docutils literal"><span class="pre">Grammar/Grammar</span></tt>, notice that the <tt class="docutils literal"><span class="pre">pgen</span></tt> program is run to re-generate <tt class="docutils literal"><span class="pre">Include/graminit.h</span></tt> and <tt class="docutils literal"><span class="pre">Python/graminit.c</span></tt>, and then several files get re-compiled.</p>
</div>
<div class="section" id="modifying-the-ast-generation-code">
<h3>Modifying the AST generation code</h3>
<p>After the Python parser has created a parse tree, this tree is converted into an AST, since ASTs are <a class="reference external" href="http://eli.thegreenplace.net/2009/02/16/abstract-vs-concrete-syntax-trees/">much simpler to work with</a> in subsequent stages of the compilation process.</p>
<p>So, we&#8217;re going to visit <tt class="docutils literal"><span class="pre">Parser/Python.asdl</span></tt> which defines the structure of Python&#8217;s ASTs and add an AST node for our new <tt class="docutils literal"><span class="pre">until</span></tt> statement, again right below the <tt class="docutils literal"><span class="pre">while</span></tt>:</p>
<div class="highlight">
<pre>| While(expr test, stmt* body, stmt* orelse)
| Until(expr test, stmt* body)
</pre>
</div>
<p>If you now run <tt class="docutils literal"><span class="pre">make</span></tt>, notice that before compiling a bunch of files, <tt class="docutils literal"><span class="pre">Parser/asdl_c.py</span></tt> is run to generate C code from the AST definition file. This (like <tt class="docutils literal"><span class="pre">Grammar/Grammar</span></tt>) is another example of the Python source-code using a mini-language (in other words, a DSL) to simplify programming. Also note that since <tt class="docutils literal"><span class="pre">Parser/asdl_c.py</span></tt> is a Python script, this is a kind of <a class="reference external" href="http://en.wikipedia.org/wiki/Bootstrapping_%28compilers%29">bootstrapping</a> &#8211; to build Python from scratch, Python already has to be available.</p>
<p>While <tt class="docutils literal"><span class="pre">Parser/asdl_c.py</span></tt> generated the code to manage our newly defined AST node (into the files <tt class="docutils literal"><span class="pre">Include/Python-ast.h</span></tt> and <tt class="docutils literal"><span class="pre">Python/Python-ast.c</span></tt>), we still have to write the code that converts a relevant parse-tree node into it by hand. This is done in the file <tt class="docutils literal"><span class="pre">Python/ast.c</span></tt>. There, a function named <tt class="docutils literal"><span class="pre">ast_for_stmt</span></tt> converts parse tree nodes for statements into AST nodes. Again, guided by our old friend <tt class="docutils literal"><span class="pre">while</span></tt>, we jump right into the big <tt class="docutils literal"><span class="pre">switch</span></tt> for handling compound statements and add a clause for <tt class="docutils literal"><span class="pre">until_stmt</span></tt>:</p>
<div class="highlight">
<pre><span style="color: #00007f; font-weight: bold">case</span> while_stmt:
    <span style="color: #00007f; font-weight: bold">return</span> ast_for_while_stmt(c, ch);
<span style="color: #00007f; font-weight: bold">case</span> until_stmt:
    <span style="color: #00007f; font-weight: bold">return</span> ast_for_until_stmt(c, ch);
</pre>
</div>
<p>Now we should implement <tt class="docutils literal"><span class="pre">ast_for_until_stmt</span></tt>. Here it is:</p>
<div class="highlight">
<pre><span style="color: #00007f; font-weight: bold">static</span> stmt_ty
<span style="color: #00007f">ast_for_until_stmt</span>(<span style="color: #00007f; font-weight: bold">struct</span> compiling *c, <span style="color: #00007f; font-weight: bold">const</span> node *n)
{
    <span style="color: #007f00">/* until_stmt: &#39;until&#39; test &#39;:&#39; suite */</span>
    REQ(n, until_stmt);

    <span style="color: #00007f; font-weight: bold">if</span> (NCH(n) == <span style="color: #007f7f">4</span>) {
        expr_ty expression;
        asdl_seq *suite_seq;

        expression = ast_for_expr(c, CHILD(n, <span style="color: #007f7f">1</span>));
        <span style="color: #00007f; font-weight: bold">if</span> (!expression)
            <span style="color: #00007f; font-weight: bold">return</span> <span style="color: #00007f">NULL</span>;
        suite_seq = ast_for_suite(c, CHILD(n, <span style="color: #007f7f">3</span>));
        <span style="color: #00007f; font-weight: bold">if</span> (!suite_seq)
            <span style="color: #00007f; font-weight: bold">return</span> <span style="color: #00007f">NULL</span>;
        <span style="color: #00007f; font-weight: bold">return</span> Until(expression, suite_seq, LINENO(n), n-&gt;n_col_offset, c-&gt;c_arena);
    }

    PyErr_Format(PyExc_SystemError,
                 <span style="color: #7f007f">&quot;wrong number of tokens for &#39;until&#39; statement: %d&quot;</span>,
                 NCH(n));
    <span style="color: #00007f; font-weight: bold">return</span> <span style="color: #00007f">NULL</span>;
}
</pre>
</div>
<p>Again, this was coded while closely looking at the equivalent <tt class="docutils literal"><span class="pre">ast_for_while_stmt</span></tt>, with the difference that for <tt class="docutils literal"><span class="pre">until</span></tt> I&#8217;ve decided not to support the <tt class="docutils literal"><span class="pre">else</span></tt> clause. As expected, the AST is created recursively, using other AST creating functions like <tt class="docutils literal"><span class="pre">ast_for_expr</span></tt> for the condition expression and <tt class="docutils literal"><span class="pre">ast_for_suite</span></tt> for the body of the <tt class="docutils literal"><span class="pre">until</span></tt> statement. Finally, a new node named <tt class="docutils literal"><span class="pre">Until</span></tt> is returned.</p>
<p>Note that we access the parse-tree node <tt class="docutils literal"><span class="pre">n</span></tt> using some macros like <tt class="docutils literal"><span class="pre">NCH</span></tt> and <tt class="docutils literal"><span class="pre">CHILD</span></tt>. These are worth understanding &#8211; their code is in <tt class="docutils literal"><span class="pre">Include/node.h</span></tt>.</p>
</div>
<div class="section" id="digression-ast-composition">
<h3>Digression: AST composition</h3>
<p>I chose to create a new type of AST for the <tt class="docutils literal"><span class="pre">until</span></tt> statement, but actually this isn&#8217;t necessary. I could&#8217;ve saved some work and implemented the new functionality using composition of existing AST nodes, since:</p>
<div class="highlight">
<pre>until condition:
   <span style="color: #007f00"># do stuff</span>
</pre>
</div>
<p>Is functionally equivalent to:</p>
<div class="highlight">
<pre><span style="color: #00007f; font-weight: bold">while</span> <span style="color: #0000aa">not</span> condition:
  <span style="color: #007f00"># do stuff</span>
</pre>
</div>
<p>Instead of creating the <tt class="docutils literal"><span class="pre">Until</span></tt> node in <tt class="docutils literal"><span class="pre">ast_for_until_stmt</span></tt>, I could have created a <tt class="docutils literal"><span class="pre">Not</span></tt> node with an <tt class="docutils literal"><span class="pre">While</span></tt> node as a child. Since the AST compiler already knows how to handle these nodes, the next steps of the process could be skipped.</p>
</div>
<div class="section" id="compiling-asts-into-bytecode">
<h3>Compiling ASTs into bytecode</h3>
<p>The next step is compiling the AST into Python bytecode. The compilation has an intermediate result which is a CFG (Control Flow Graph), but since the same code handles it I will ignore this detail for now and leave it for another article.</p>
<p>The code we will look at next is <tt class="docutils literal"><span class="pre">Python/compile.c</span></tt>. Following the lead of <tt class="docutils literal"><span class="pre">while</span></tt>, we find the function <tt class="docutils literal"><span class="pre">compiler_visit_stmt</span></tt>, which is responsible for compiling statements into bytecode. We add a clause for <tt class="docutils literal"><span class="pre">Until</span></tt>:</p>
<div class="highlight">
<pre><span style="color: #00007f; font-weight: bold">case</span> While_kind:
    <span style="color: #00007f; font-weight: bold">return</span> compiler_while(c, s);
<span style="color: #00007f; font-weight: bold">case</span> Until_kind:
    <span style="color: #00007f; font-weight: bold">return</span> compiler_until(c, s);
</pre>
</div>
<p>If you wonder what <tt class="docutils literal"><span class="pre">Until_kind</span></tt> is, it&#8217;s a constant (actually a value of the <tt class="docutils literal"><span class="pre">_stmt_kind</span></tt> enumeration) automatically generated from the AST definition file into <tt class="docutils literal"><span class="pre">Include/Python-ast.h</span></tt>. Anyway, we call <tt class="docutils literal"><span class="pre">compiler_until</span></tt> which, of course, still doesn&#8217;t exist. I&#8217;ll get to it an a moment.</p>
<p>If you&#8217;re curious like me, you&#8217;ll notice that <tt class="docutils literal"><span class="pre">compiler_visit_stmt</span></tt> is peculiar. No amount of <tt class="docutils literal"><span class="pre">grep</span></tt>-ping the source tree reveals where it is called. When this is the case, only one option remains &#8211; C macro-fu. Indeed, a short investigation leads us to the <tt class="docutils literal"><span class="pre">VISIT</span></tt> macro defined in <tt class="docutils literal"><span class="pre">Python/compile.c</span></tt>:</p>
<div class="highlight">
<pre><span style="color: #007f00">#define VISIT(C, TYPE, V) {\</span>
<span style="color: #007f00">    if (!compiler_visit_ ## TYPE((C), (V))) \</span>
<span style="color: #007f00">        return 0; \</span>
</pre>
</div>
<p>It&#8217;s used to invoke <tt class="docutils literal"><span class="pre">compiler_visit_stmt</span></tt> in <tt class="docutils literal"><span class="pre">compiler_body</span></tt>. Back to our business, however&#8230;</p>
<p>As promised, here&#8217;s <tt class="docutils literal"><span class="pre">compiler_until</span></tt>:</p>
<div class="highlight">
<pre><span style="color: #00007f; font-weight: bold">static</span> <span style="color: #00007f; font-weight: bold">int</span>
<span style="color: #00007f">compiler_until</span>(<span style="color: #00007f; font-weight: bold">struct</span> compiler *c, stmt_ty s)
{
    basicblock *loop, *end, *anchor = <span style="color: #00007f">NULL</span>;
    <span style="color: #00007f; font-weight: bold">int</span> constant = expr_constant(s-&gt;v.Until.test);

    <span style="color: #00007f; font-weight: bold">if</span> (constant == <span style="color: #007f7f">1</span>) {
        <span style="color: #00007f; font-weight: bold">return</span> <span style="color: #007f7f">1</span>;
    }
    loop = compiler_new_block(c);
    end = compiler_new_block(c);
    <span style="color: #00007f; font-weight: bold">if</span> (constant == -<span style="color: #007f7f">1</span>) {
        anchor = compiler_new_block(c);
        <span style="color: #00007f; font-weight: bold">if</span> (anchor == <span style="color: #00007f">NULL</span>)
            <span style="color: #00007f; font-weight: bold">return</span> <span style="color: #007f7f">0</span>;
    }
    <span style="color: #00007f; font-weight: bold">if</span> (loop == <span style="color: #00007f">NULL</span> || end == <span style="color: #00007f">NULL</span>)
        <span style="color: #00007f; font-weight: bold">return</span> <span style="color: #007f7f">0</span>;

    ADDOP_JREL(c, SETUP_LOOP, end);
    compiler_use_next_block(c, loop);
    <span style="color: #00007f; font-weight: bold">if</span> (!compiler_push_fblock(c, LOOP, loop))
        <span style="color: #00007f; font-weight: bold">return</span> <span style="color: #007f7f">0</span>;
    <span style="color: #00007f; font-weight: bold">if</span> (constant == -<span style="color: #007f7f">1</span>) {
        VISIT(c, expr, s-&gt;v.Until.test);
        ADDOP_JABS(c, POP_JUMP_IF_TRUE, anchor);
    }
    VISIT_SEQ(c, stmt, s-&gt;v.Until.body);
    ADDOP_JABS(c, JUMP_ABSOLUTE, loop);

    <span style="color: #00007f; font-weight: bold">if</span> (constant == -<span style="color: #007f7f">1</span>) {
        compiler_use_next_block(c, anchor);
        ADDOP(c, POP_BLOCK);
    }
    compiler_pop_fblock(c, LOOP, loop);
    compiler_use_next_block(c, end);

    <span style="color: #00007f; font-weight: bold">return</span> <span style="color: #007f7f">1</span>;
}
</pre>
</div>
<p>I have a confession to make: this code wasn&#8217;t written based on a deep understanding of Python bytecode. Like the rest of the article, it was done in imitation of the kin <tt class="docutils literal"><span class="pre">compiler_while</span></tt> function. By reading it carefully, however, keeping in mind that the Python VM is stack-based, and glancing into the documentation of the <tt class="docutils literal"><span class="pre">dis</span></tt> module, which has <a class="reference external" href="http://docs.python.org/py3k/library/dis.html">a list of Python bytecodes</a> with descriptions, it&#8217;s possible to understand what&#8217;s going on.</p>
</div>
<div class="section" id="that-s-it-we-re-done-aren-t-we">
<h3>That&#8217;s it, we&#8217;re done&#8230; Aren&#8217;t we?</h3>
<p>After making all the changes and running <tt class="docutils literal"><span class="pre">make</span></tt>, we can run the newly compiled Python and try our new <tt class="docutils literal"><span class="pre">until</span></tt> statement:</p>
<div class="highlight">
<pre>&gt;&gt;&gt; until num == <span style="color: #007f7f">0</span>:
...   <span style="color: #00007f; font-weight: bold">print</span>(num)
...   num -= <span style="color: #007f7f">1</span>
...
<span style="color: #007f7f">3</span>
<span style="color: #007f7f">2</span>
<span style="color: #007f7f">1</span>
</pre>
</div>
<p>Voila, it works! Let&#8217;s see the bytecode created for the new statement by using the <tt class="docutils literal"><span class="pre">dis</span></tt> module as follows:</p>
<div class="highlight">
<pre><span style="color: #00007f; font-weight: bold">import</span> <span style="color: #00007f">dis</span>

<span style="color: #00007f; font-weight: bold">def</span> <span style="color: #00007f">myfoo</span>(num):
    until num == <span style="color: #007f7f">0</span>:
        <span style="color: #00007f; font-weight: bold">print</span>(num)
        num -= <span style="color: #007f7f">1</span>

dis.dis(myfoo)
</pre>
</div>
<p>Here&#8217;s the result:</p>
<div class="highlight">
<pre>4           0 SETUP_LOOP              36 (to 39)
      &gt;&gt;    3 LOAD_FAST                0 (num)
            6 LOAD_CONST               1 (0)
            9 COMPARE_OP               2 (==)
           12 POP_JUMP_IF_TRUE        38

5          15 LOAD_NAME                0 (print)
           18 LOAD_FAST                0 (num)
           21 CALL_FUNCTION            1
           24 POP_TOP

6          25 LOAD_FAST                0 (num)
           28 LOAD_CONST               2 (1)
           31 INPLACE_SUBTRACT
           32 STORE_FAST               0 (num)
           35 JUMP_ABSOLUTE            3
      &gt;&gt;   38 POP_BLOCK
      &gt;&gt;   39 LOAD_CONST               0 (None)
           42 RETURN_VALUE
</pre>
</div>
<p>The most interesting operation is number 12: if the condition is true, we jump to after the loop. This is correct semantics for <tt class="docutils literal"><span class="pre">until</span></tt>. If the jump isn&#8217;t executed, the loop body keeps running until it jumps back to the condition at operation 35.</p>
<p>Feeling good about my change, I then tried running the function (executing <tt class="docutils literal"><span class="pre">myfoo(3)</span></tt>) instead of showing its bytecode. The result was less than encouraging:</p>
<div class="highlight">
<pre>Traceback (most recent call last):
  File &quot;zy.py&quot;, line 9, in &lt;module&gt;
    myfoo(3)
  File &quot;zy.py&quot;, line 5, in myfoo
    print(num)
SystemError: no locals when loading &#39;print&#39;
</pre>
</div>
<p>Whoa&#8230; this can&#8217;t be good. So what went wrong?</p>
</div>
<div class="section" id="the-case-of-the-missing-symbol-table">
<h3>The case of the missing symbol table</h3>
<p>One of the steps the Python compiler performs when compiling the AST is create a symbol table for the code it compiles. The call to <tt class="docutils literal"><span class="pre">PySymtable_Build</span></tt> in <tt class="docutils literal"><span class="pre">PyAST_Compile</span></tt> calls into the symbol table module (<tt class="docutils literal"><span class="pre">Python/symtable.c</span></tt>), which walks the AST in a manner similar to the code generation functions. Having a symbol table for each scope helps the compiler figure out some key information, such as which variables are global and which are local to a scope.</p>
<p>To fix the problem, we have to modify the <tt class="docutils literal"><span class="pre">symtable_visit_stmt</span></tt> function in <tt class="docutils literal"><span class="pre">Python/symtable.c</span></tt>, adding code for handling <tt class="docutils literal"><span class="pre">until</span></tt> statements, after the similar code for <tt class="docutils literal"><span class="pre">while</span></tt> statements <a class="footnote-reference" href="#id6" id="id3">[3]</a>:</p>
<div class="highlight">
<pre><span style="color: #00007f; font-weight: bold">case</span> While_kind:
    VISIT(st, expr, s-&gt;v.While.test);
    VISIT_SEQ(st, stmt, s-&gt;v.While.body);
    <span style="color: #00007f; font-weight: bold">if</span> (s-&gt;v.While.orelse)
        VISIT_SEQ(st, stmt, s-&gt;v.While.orelse);
    <span style="color: #00007f; font-weight: bold">break</span>;
<span style="color: #00007f; font-weight: bold">case</span> Until_kind:
    VISIT(st, expr, s-&gt;v.Until.test);
    VISIT_SEQ(st, stmt, s-&gt;v.Until.body);
    <span style="color: #00007f; font-weight: bold">break</span>;
</pre>
</div>
<p>And now we really are done. Compiling the source after this change makes the execution of <tt class="docutils literal"><span class="pre">myfoo(3)</span></tt> work as expected.</p>
</div>
<div class="section" id="conclusion">
<h3>Conclusion</h3>
<p>In this article I&#8217;ve demonstrated how to add a new statement to Python. Albeit requiring quite a bit of tinkering in the code of the Python compiler, the change wasn&#8217;t difficult to implement, because I used a similar and existing statement as a guideline.</p>
<p>The Python compiler is a sophisticated chunk of software, and I don&#8217;t claim being an expert in it. However, I am really interested in the internals of Python, and particularly its front-end. Therefore, I found this exercise a very useful companion to theoretical study of the compiler&#8217;s principles and source code. It will serve as a base for future articles that will get deeper into the compiler.</p>
</div>
<div class="section" id="references">
<h3>References</h3>
<p>I used a few excellent references for the construction of this article. Here they are, in no particular order:</p>
<ul class="simple">
<li><a class="reference external" href="http://www.python.org/dev/peps/pep-0339/">PEP 339: Design of the CPython compiler</a> &#8211; probably the most important and comprehensive piece of <em>official</em> documentation for the Python compiler. Being very short, it painfully displays the scarcity of good documentation of the internals of Python.</li>
<li>&quot;Python Compiler Internals&quot; &#8211; an article by Thomas Lee</li>
<li>&quot;Python: Design and Implementation&quot; &#8211; a presentation by Guido van Rossum</li>
<li>Python (2.5) Virtual Machine, A guided tour &#8211; a presentation by Peter Tröger</li>
</ul>
<div align="center" class="align-center"><img alt="http://eli.thegreenplace.net/wp-content/uploads/hline.jpg" class="align-center" src="http://eli.thegreenplace.net/wp-content/uploads/hline.jpg" style="width: 320px; height: 5px;" /></div>
<table class="docutils footnote" frame="void" id="id4" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id1">[1]</a></td>
<td>From here on, references to files in the Python source are given relatively to the root of the source tree, which is the directory where you run <tt class="docutils literal"><span class="pre">configure</span></tt> and <tt class="docutils literal"><span class="pre">make</span></tt> to build Python.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id5" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id2">[2]</a></td>
<td>This demonstrates a common technique I use when modifying source code I&#8217;m not familiar with: <em>work by similarity</em>. This principle won&#8217;t solve all your problems, but it can definitely ease the process. Since everything that has to be done for <tt class="docutils literal"><span class="pre">while</span></tt> also has to be done for <tt class="docutils literal"><span class="pre">until</span></tt>, it serves as a pretty good guideline.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id6" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id3">[3]</a></td>
<td>By the way, without this code there&#8217;s a compiler warning for <tt class="docutils literal"><span class="pre">Python/symtable.c</span></tt>. The compiler notices that the <tt class="docutils literal"><span class="pre">Until_kind</span></tt> enumeration value isn&#8217;t handled in the switch statement of <tt class="docutils literal"><span class="pre">symtable_visit_stmt</span></tt> and complains. It&#8217;s always important to check for compiler warnings!</td>
</tr>
</tbody>
</table>
</div>
<img src="http://eli.thegreenplace.net/?ak_action=api_record_view&id=2231&type=feed" alt="" />

<p>Related posts:<ol><li><a href='http://eli.thegreenplace.net/2009/11/28/python-internals-working-with-python-asts/' rel='bookmark' title='Permanent Link: Python internals: Working with Python ASTs'>Python internals: Working with Python ASTs</a> <small> Starting with Python 2.5, the Python compiler (the part...</small></li><li><a href='http://eli.thegreenplace.net/2009/02/16/abstract-vs-concrete-syntax-trees/' rel='bookmark' title='Permanent Link: Abstract vs. Concrete Syntax Trees'>Abstract vs. Concrete Syntax Trees</a> <small>CSTs &#8211; Concrete Syntax Trees (a.k.a. Parse Trees) and ASTs...</small></li><li><a href='http://eli.thegreenplace.net/2008/07/11/asts-for-analyzing-c/' rel='bookmark' title='Permanent Link: ASTs for analyzing C'>ASTs for analyzing C</a> <small>As I wrote here, I&#8217;ve commonly found myself in the...</small></li></ol></p>]]></content:encoded>
			<wfw:commentRss>http://eli.thegreenplace.net/2010/06/30/python-internals-adding-a-new-statement-to-python/feed/</wfw:commentRss>
		<slash:comments>23</slash:comments>
		</item>
		<item>
		<title>AES encryption of files in Python with PyCrypto</title>
		<link>http://eli.thegreenplace.net/2010/06/25/aes-encryption-of-files-in-python-with-pycrypto/</link>
		<comments>http://eli.thegreenplace.net/2010/06/25/aes-encryption-of-files-in-python-with-pycrypto/#comments</comments>
		<pubDate>Fri, 25 Jun 2010 16:26:49 +0000</pubDate>
		<dc:creator>eliben</dc:creator>
				<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://eli.thegreenplace.net/?p=2227</guid>
		<description><![CDATA[The PyCrypto module seems to provide all one needs for employing strong cryptography in a program. It wraps a highly optimized C implementation of many popular encryption algorithms with a Python interface. PyCrypto can be built from source on Linux, and Windows binaries for various versions of Python 2.x were kindly made available by Michael [...]


No related posts.]]></description>
			<content:encoded><![CDATA[<p>The <a class="reference external" href="http://www.pycrypto.org/">PyCrypto</a> module seems to provide all one needs for employing strong cryptography in a program. It wraps a highly optimized C implementation of many popular encryption algorithms with a Python interface. PyCrypto can be built from source on Linux, and Windows binaries for various versions of Python 2.x were kindly made available by Michael Foord on <a class="reference external" href="http://www.voidspace.org.uk/python/modules.shtml">this page</a>.</p>
<p>My only gripe with PyCrypto is its documentation. The auto-generated <a class="reference external" href="http://www.dlitz.net/software/pycrypto/apidoc/">API doc</a> is next to useless, and <a class="reference external" href="http://www.dlitz.net/software/pycrypto/doc/">this overview</a> is somewhat dated and didn&#8217;t address the questions I had about the module. It isn&#8217;t surprising that a few modules were created just to provide simpler and better documented wrappers around PyCrypto.</p>
<p>In this article I want to present how to use PyCrypto for simple symmetric encryption and decryption of files using the AES algorithm.</p>
<div class="section" id="simple-aes-encryption">
<h3>Simple AES encryption</h3>
<p>Here&#8217;s how one can encrypt a string with AES:</p>
<div class="highlight">
<pre><span style="color: #00007f; font-weight: bold">from</span> <span style="color: #00007f">Crypto.Cipher</span> <span style="color: #00007f; font-weight: bold">import</span> AES

key = <span style="color: #7f007f">&#39;0123456789abcdef&#39;</span>
mode = AES.MODE_CBC
encryptor = AES.new(key, mode)

text = <span style="color: #7f007f">&#39;j&#39;</span> * <span style="color: #007f7f">64</span> + <span style="color: #7f007f">&#39;i&#39;</span> * <span style="color: #007f7f">128</span>
ciphertext = encryptor.encrypt(text)
</pre>
</div>
<p>Since the PyCrypto block-level encryption API is very low-level, it expects your key to be either 16, 24 or 32 bytes long (for AES-128, AES-196 and AES-256, respectively). The longer the key, the stronger the encryption.</p>
<p>Having keys of exact length isn&#8217;t very convenient, as you sometimes want to use some mnemonic password for the key. In this case I recommend picking a password and then using the SHA-256 digest algorithm from <tt class="docutils literal"><span class="pre">hashlib</span></tt> to generate a 32-byte key from it. Just replace the assignment to <tt class="docutils literal"><span class="pre">key</span></tt> in the code above with:</p>
<div class="highlight">
<pre><span style="color: #00007f; font-weight: bold">import</span> <span style="color: #00007f">hashlib</span>

password = <span style="color: #7f007f">&#39;kitty&#39;</span>
key = hashlib.sha256(password).digest()
</pre>
</div>
<p>Keep in mind that this 32-byte key only has as much entropy as your original password. So be wary of brute-force password guessing, and pick a relatively strong password (<em>kitty</em> probably won&#8217;t do). What&#8217;s useful about this technique is that you don&#8217;t have to worry about manually padding your password &#8211; SHA-256 will scramble a 32-byte block out of any password for you.</p>
<p>The next thing the code does is set the <a class="reference external" href="http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation">block mode</a> of AES. I won&#8217;t get into all the details, but unless you have some special requirements, CBC should be good enough for you.</p>
<p>We create a new AES encryptor object with <tt class="docutils literal"><span class="pre">Crypto.Cipher.AES.new</span></tt>, and give it the encryption key and the mode. Next comes the encryption itself. Again, since the API is low-level, the <tt class="docutils literal"><span class="pre">encrypt</span></tt> method expects your input to consist of an integral number of 16-byte blocks (16 is the size of the basic AES block).</p>
<p>The <tt class="docutils literal"><span class="pre">encryptor</span></tt> object has an internal state when used in the CBC mode, so if you try to encrypt the same text with the same encryptor once again &#8211; you will get different results. So be careful to create a fresh AES encryptor object for any encryption/decryption job.</p>
</div>
<div class="section" id="decryption">
<h3>Decryption</h3>
<p>To decrypt the ciphertext, simply add:</p>
<div class="highlight">
<pre>decryptor = AES.new(key, mode)
plain = decryptor.decrypt(ciphertext)
</pre>
</div>
<p>And you get your plaintext back again.</p>
</div>
<div class="section" id="a-word-about-the-initialization-vector">
<h3>A word about the initialization vector</h3>
<p>The <a class="reference external" href="http://en.wikipedia.org/wiki/Initialization_vector">initialization vector</a> (IV) is an important part of block encryption algorithms that work in chained modes like CBC. For the simple example above I&#8217;ve ignored the IV, but for a more serious application this is a grave mistake. I don&#8217;t want to get too deep into cryptographic theory here, but it suffices to say that the IV is as important as the salt in hashed passwords, and the lack of correct IV usage led to the cracking of the <a class="reference external" href="http://en.wikipedia.org/wiki/Wired_Equivalent_Privacy">WEP encryption</a> for wireless LAN.</p>
<p><tt class="docutils literal"><span class="pre">PyCrypto</span></tt> allows one to pass an IV into the <tt class="docutils literal"><span class="pre">AES.new</span></tt> creator function. For maximal security, the IV should be randomly generated for every new encryption and can be stored together with the ciphertext. Knowledge of the IV won&#8217;t help the attacker crack your encryption. What can help him, however, is your reusing the same IV with the same encryption key for multiple encryptions.</p>
</div>
<div class="section" id="encrypting-and-decrypting-files">
<h3>Encrypting and decrypting files</h3>
<p>The following function encrypts a file of any size. It makes sure to pad the file to a multiple of the AES block length , and also handles the random generation of IV.</p>
<div class="highlight">
<pre><span style="color: #00007f; font-weight: bold">import</span> <span style="color: #00007f">os</span>, <span style="color: #00007f">random</span>, <span style="color: #00007f">struct</span>
<span style="color: #00007f; font-weight: bold">from</span> <span style="color: #00007f">Crypto.Cipher</span> <span style="color: #00007f; font-weight: bold">import</span> AES

<span style="color: #00007f; font-weight: bold">def</span> <span style="color: #00007f">encrypt_file</span>(key, in_filename, out_filename=<span style="color: #00007f">None</span>, chunksize=<span style="color: #007f7f">64</span>*<span style="color: #007f7f">1024</span>):
    <span style="color: #7f007f">&quot;&quot;&quot; Encrypts a file using AES (CBC mode) with the</span>
<span style="color: #7f007f">        given key.</span>

<span style="color: #7f007f">        key:</span>
<span style="color: #7f007f">            The encryption key - a string that must be</span>
<span style="color: #7f007f">            either 16, 24 or 32 bytes long. Longer keys</span>
<span style="color: #7f007f">            are more secure.</span>

<span style="color: #7f007f">        in_filename:</span>
<span style="color: #7f007f">            Name of the input file</span>

<span style="color: #7f007f">        out_filename:</span>
<span style="color: #7f007f">            If None, &#39;&lt;in_filename&gt;.enc&#39; will be used.</span>

<span style="color: #7f007f">        chunksize:</span>
<span style="color: #7f007f">            Sets the size of the chunk which the function</span>
<span style="color: #7f007f">            uses to read and encrypt the file. Larger chunk</span>
<span style="color: #7f007f">            sizes can be faster for some files and machines.</span>
<span style="color: #7f007f">            chunksize must be divisible by 16.</span>
<span style="color: #7f007f">    &quot;&quot;&quot;</span>
    <span style="color: #00007f; font-weight: bold">if</span> <span style="color: #0000aa">not</span> out_filename:
        out_filename = in_filename + <span style="color: #7f007f">&#39;.enc&#39;</span>

    iv = <span style="color: #7f007f">&#39;&#39;</span>.join(<span style="color: #00007f">chr</span>(random.randint(<span style="color: #007f7f">0</span>, <span style="color: #007f7f">0</span>xFF)) <span style="color: #00007f; font-weight: bold">for</span> i <span style="color: #0000aa">in</span> <span style="color: #00007f">range</span>(<span style="color: #007f7f">16</span>))
    encryptor = AES.new(key, AES.MODE_CBC, iv)
    filesize = os.path.getsize(in_filename)

    <span style="color: #00007f; font-weight: bold">with</span> <span style="color: #00007f">open</span>(in_filename, <span style="color: #7f007f">&#39;rb&#39;</span>) <span style="color: #00007f; font-weight: bold">as</span> infile:
        <span style="color: #00007f; font-weight: bold">with</span> <span style="color: #00007f">open</span>(out_filename, <span style="color: #7f007f">&#39;wb&#39;</span>) <span style="color: #00007f; font-weight: bold">as</span> outfile:
            outfile.write(struct.pack(<span style="color: #7f007f">&#39;&lt;Q&#39;</span>, filesize))
            outfile.write(iv)

            <span style="color: #00007f; font-weight: bold">while</span> <span style="color: #00007f">True</span>:
                chunk = infile.read(chunksize)
                <span style="color: #00007f; font-weight: bold">if</span> <span style="color: #00007f">len</span>(chunk) == <span style="color: #007f7f">0</span>:
                    <span style="color: #00007f; font-weight: bold">break</span>
                <span style="color: #00007f; font-weight: bold">elif</span> <span style="color: #00007f">len</span>(chunk) % <span style="color: #007f7f">16</span> != <span style="color: #007f7f">0</span>:
                    chunk += <span style="color: #7f007f">&#39; &#39;</span> * (<span style="color: #007f7f">16</span> - <span style="color: #00007f">len</span>(chunk) % <span style="color: #007f7f">16</span>)

                outfile.write(encryptor.encrypt(chunk))
</pre>
</div>
<p>Since it might have to pad the file to fit into a multiple of 16, the function saves the original file size in the first 8 bytes of the output file (more precisely, the first <tt class="docutils literal"><span class="pre">sizeof(long</span> <span class="pre">long)</span></tt> bytes). It randomly generates a 16-byte IV and stores it in the file as well. Then, it reads the input file chunk by chunk (with chunk size configurable), encrypts the chunk and writes it to the output. The last chunk is padded with spaces, if required.</p>
<p>Working in chunks makes sure that large files can be efficiently processed without reading them wholly into memory. For example, with the default chunk size it takes about 1.2 seconds on my computer to encrypt a 50MB file. PyCrypto is fast!</p>
<p>Decrypting the file can be done with:</p>
<div class="highlight">
<pre><span style="color: #00007f; font-weight: bold">def</span> <span style="color: #00007f">decrypt_file</span>(key, in_filename, out_filename=<span style="color: #00007f">None</span>, chunksize=<span style="color: #007f7f">24</span>*<span style="color: #007f7f">1024</span>):
    <span style="color: #7f007f">&quot;&quot;&quot; Decrypts a file using AES (CBC mode) with the</span>
<span style="color: #7f007f">        given key. Parameters are similar to encrypt_file,</span>
<span style="color: #7f007f">        with one difference: out_filename, if not supplied</span>
<span style="color: #7f007f">        will be in_filename without its last extension</span>
<span style="color: #7f007f">        (i.e. if in_filename is &#39;aaa.zip.enc&#39; then</span>
<span style="color: #7f007f">        out_filename will be &#39;aaa.zip&#39;)</span>
<span style="color: #7f007f">    &quot;&quot;&quot;</span>
    <span style="color: #00007f; font-weight: bold">if</span> <span style="color: #0000aa">not</span> out_filename:
        out_filename = os.path.splitext(in_filename)[<span style="color: #007f7f">0</span>]

    <span style="color: #00007f; font-weight: bold">with</span> <span style="color: #00007f">open</span>(in_filename, <span style="color: #7f007f">&#39;rb&#39;</span>) <span style="color: #00007f; font-weight: bold">as</span> infile:
        origsize = struct.unpack(<span style="color: #7f007f">&#39;&lt;Q&#39;</span>, infile.read(struct.calcsize(<span style="color: #7f007f">&#39;Q&#39;</span>)))[<span style="color: #007f7f">0</span>]
        iv = infile.read(<span style="color: #007f7f">16</span>)
        decryptor = AES.new(key, AES.MODE_CBC, iv)

        <span style="color: #00007f; font-weight: bold">with</span> <span style="color: #00007f">open</span>(out_filename, <span style="color: #7f007f">&#39;wb&#39;</span>) <span style="color: #00007f; font-weight: bold">as</span> outfile:
            <span style="color: #00007f; font-weight: bold">while</span> <span style="color: #00007f">True</span>:
                chunk = infile.read(chunksize)
                <span style="color: #00007f; font-weight: bold">if</span> <span style="color: #00007f">len</span>(chunk) == <span style="color: #007f7f">0</span>:
                    <span style="color: #00007f; font-weight: bold">break</span>
                outfile.write(decryptor.decrypt(chunk))

            outfile.truncate(origsize)
</pre>
</div>
<p>First the original size of the file is read from the first 8 bytes of the encrypted file. The IV is read next to correctly initialize the AES object. Then the file is decrypted in chunks, and finally it&#8217;s truncated to the original size, so the padding is thrown out.</p>
</div>
<img src="http://eli.thegreenplace.net/?ak_action=api_record_view&id=2227&type=feed" alt="" />

<p>No related posts.</p>]]></content:encoded>
			<wfw:commentRss>http://eli.thegreenplace.net/2010/06/25/aes-encryption-of-files-in-python-with-pycrypto/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>The perils of unsigned iteration in C/C++</title>
		<link>http://eli.thegreenplace.net/2010/06/11/the-perils-of-unsigned-iteration-in-cc/</link>
		<comments>http://eli.thegreenplace.net/2010/06/11/the-perils-of-unsigned-iteration-in-cc/#comments</comments>
		<pubDate>Fri, 11 Jun 2010 04:17:03 +0000</pubDate>
		<dc:creator>eliben</dc:creator>
				<category><![CDATA[C / C++]]></category>

		<guid isPermaLink="false">http://eli.thegreenplace.net/?p=2206</guid>
		<description><![CDATA[C and C++ frequently coax you into using an unsigned type for iteration. Standard functions like strlen and the size method of containers (in C++) return size_t, which is an unsigned type, so to avoid conversion warnings you comply and iterate with a variable of the appropriate type. For example:

size_t len = strlen(some_c_str);
size_t i;
for (i [...]


Related posts:<ol><li><a href='http://eli.thegreenplace.net/2004/07/18/cc-annoynace-unsigned-iteration/' rel='bookmark' title='Permanent Link: c/c++ annoyance &#8211; unsigned iteration'>c/c++ annoyance &#8211; unsigned iteration</a> <small>I stumble on the following problem a lot: Consider iterating...</small></li><li><a href='http://eli.thegreenplace.net/2004/10/01/complying-with-wall-pedantic-ansi/' rel='bookmark' title='Permanent Link: complying with -Wall -pedantic -ansi'>complying with -Wall -pedantic -ansi</a> <small>Ah&#8230; the triple that are the enemy of every hacker...</small></li><li><a href='http://eli.thegreenplace.net/2003/07/23/allocating-multi-dimensional-arrays-in-c/' rel='bookmark' title='Permanent Link: Allocating multi-dimensional arrays in C++'>Allocating multi-dimensional arrays in C++</a> <small>Updated on 04.06.2010 Allocating multi-dimensional arrays in C++ (and C)...</small></li></ol>]]></description>
			<content:encoded><![CDATA[<p>C and C++ frequently coax you into using an unsigned type for iteration. Standard functions like <tt class="docutils literal"><span class="pre">strlen</span></tt> and the <tt class="docutils literal"><span class="pre">size</span></tt> method of containers (in C++) return <tt class="docutils literal"><span class="pre">size_t</span></tt>, which is an unsigned type, so to avoid conversion warnings you comply and iterate with a variable of the appropriate type. For example:</p>
<div class="highlight">
<pre>size_t len = strlen(some_c_str);
size_t i;
<span style="color: #00007f; font-weight: bold">for</span> (i = <span style="color: #007f7f">0</span>; i &lt; len; ++i) {
  <span style="color: #007f00">/* Do stuff with each char of some_c_str</span>
<span style="color: #007f00">  */</span>
}
</pre>
</div>
<p>I&#8217;ve <a class="reference external" href="http://eli.thegreenplace.net/2004/07/18/cc-annoynace-unsigned-iteration/">long</a> been aware of one painful gotcha of using <tt class="docutils literal"><span class="pre">size_t</span></tt> for iteration &#8211; using it for iterating backwards. The following code will fail:</p>
<div class="highlight">
<pre><span style="color: #007f00">/* Warning: buggy code!</span>
<span style="color: #007f00">*/</span>
size_t len = strlen(some_c_str);
size_t i;
<span style="color: #00007f; font-weight: bold">for</span> (i = len - <span style="color: #007f7f">1</span>; i &gt;= <span style="color: #007f7f">0</span>; --i) {
  <span style="color: #007f00">/* Do stuff with each char of some_c_str, backwards</span>
<span style="color: #007f00">  */</span>
}
</pre>
</div>
<p>When <tt class="docutils literal"><span class="pre">i</span></tt> reaches 0 it&#8217;s still within bounds, so it will be decremented and become a huge positive number (probably <tt class="docutils literal"><span class="pre">2^((sizeof(size_t)*8)</span> <span class="pre">-</span> <span class="pre">1</span></tt>). Congratulations, we have an infinite loop.</p>
<p>Today I ran into another manifestation of this problem. This one is more insidious, because it happens only for some kinds of input. I wrote the following code because the operation had to consider each character in the string and the character after it:</p>
<div class="highlight">
<pre><span style="color: #007f00">/* Warning: buggy code!</span>
<span style="color: #007f00">*/</span>
size_t len = strlen(some_c_str);
size_t i;
<span style="color: #00007f; font-weight: bold">for</span> (i = <span style="color: #007f7f">0</span>; i &lt; len - <span style="color: #007f7f">1</span>; ++i) {
  <span style="color: #007f00">/* Do stuff with some_c_str[i] and some_c_str[i+1].</span>
<span style="color: #007f00">  */</span>
}
</pre>
</div>
<p>Can you spot the bug?</p>
<p>When <tt class="docutils literal"><span class="pre">some_c_str</span></tt> is empty, <tt class="docutils literal"><span class="pre">len</span></tt> is 0. Therefore, <tt class="docutils literal"><span class="pre">i</span></tt> is compared with the unsigned version of -1, which is that huge positive number again. What chance does poor <tt class="docutils literal"><span class="pre">i</span></tt> have against such a giant? It will just keep chugging along, well beyond the length of my string.</p>
<p>As I see it, to avoid the problem we can either:</p>
<ol class="arabic simple">
<li>Use an <tt class="docutils literal"><span class="pre">int</span></tt> variable and cast the return value of <tt class="docutils literal"><span class="pre">strlen</span></tt> to <tt class="docutils literal"><span class="pre">int</span></tt>. This feels a bit dirty, especially in C++ where you&#8217;d have to use <tt class="docutils literal"><span class="pre">static_cast&lt;int&gt;</span></tt>.</li>
<li>Just keep using unsigned types for iteration, but be extra careful and use various hacks to avoid the problematic corner cases.</li>
</ol>
<p>None of these options is ideal, so if you have a better idea, let me know.</p>
<p><strong>Edit 12.06.2010:</strong> Thanks everyone for the excellent comments! It&#8217;s obvious creative ways exist to overcome this problem for unsigned types. Still, it remains a gotcha even seasoned programmers stumble upon from time to time. It&#8217;s not surprising that many C/C++ style guides recommend keeping unsigned types for bitfields only, using plain ints for everything else.</p>
<img src="http://eli.thegreenplace.net/?ak_action=api_record_view&id=2206&type=feed" alt="" />

<p>Related posts:<ol><li><a href='http://eli.thegreenplace.net/2004/07/18/cc-annoynace-unsigned-iteration/' rel='bookmark' title='Permanent Link: c/c++ annoyance &#8211; unsigned iteration'>c/c++ annoyance &#8211; unsigned iteration</a> <small>I stumble on the following problem a lot: Consider iterating...</small></li><li><a href='http://eli.thegreenplace.net/2004/10/01/complying-with-wall-pedantic-ansi/' rel='bookmark' title='Permanent Link: complying with -Wall -pedantic -ansi'>complying with -Wall -pedantic -ansi</a> <small>Ah&#8230; the triple that are the enemy of every hacker...</small></li><li><a href='http://eli.thegreenplace.net/2003/07/23/allocating-multi-dimensional-arrays-in-c/' rel='bookmark' title='Permanent Link: Allocating multi-dimensional arrays in C++'>Allocating multi-dimensional arrays in C++</a> <small>Updated on 04.06.2010 Allocating multi-dimensional arrays in C++ (and C)...</small></li></ol></p>]]></content:encoded>
			<wfw:commentRss>http://eli.thegreenplace.net/2010/06/11/the-perils-of-unsigned-iteration-in-cc/feed/</wfw:commentRss>
		<slash:comments>31</slash:comments>
		</item>
		<item>
		<title>The intuition behind Fisher-Yates shuffling</title>
		<link>http://eli.thegreenplace.net/2010/05/28/the-intuition-behind-fisher-yates-shuffling/</link>
		<comments>http://eli.thegreenplace.net/2010/05/28/the-intuition-behind-fisher-yates-shuffling/#comments</comments>
		<pubDate>Fri, 28 May 2010 06:05:52 +0000</pubDate>
		<dc:creator>eliben</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://eli.thegreenplace.net/?p=2194</guid>
		<description><![CDATA[One common programming question is how to randomly shuffle an array of numbers in-place. There are a few wrong answers to this question &#8211; some simple shuffles people tend to think of immediately turn out to be inadequate. In particular, the most common naive algorithm that comes up is [1]:

naive_shuffle(arr):
  if len(arr) &#62; 1:
 [...]


Related posts:<ol><li><a href='http://eli.thegreenplace.net/2008/08/23/initializing-an-array-in-constant-time/' rel='bookmark' title='Permanent Link: Initializing an array in constant time'>Initializing an array in constant time</a> <small>The problem We want to use a very large array...</small></li></ol>]]></description>
			<content:encoded><![CDATA[<p>One common programming question is how to randomly shuffle an array of numbers in-place. There are a few wrong answers to this question &#8211; some simple shuffles people tend to think of immediately turn out to be inadequate. In particular, the most common naive algorithm that comes up is <a class="footnote-reference" href="#id5" id="id1">[1]</a>:</p>
<div class="highlight">
<pre>naive_shuffle(arr):
  if len(arr) &gt; 1:
    for i in 0 .. len(arr) - 1:
      s = random from inclusive range [0:len(arr)-1]
      swap arr[s] with arr[i]
</pre>
</div>
<p>This algorithm produces results that are badly skewed. For more information consult <a class="reference external" href="http://www.codinghorror.com/blog/2007/12/the-danger-of-naivete.html">this post by Jeff Attwood</a>, and <a class="reference external" href="http://stackoverflow.com/questions/859253/why-does-this-simple-shuffle-algorithm-produce-biased-results-what-is-a-simple%20%282nd%20answer%29">this SO discussion</a>.</p>
<p>The <em>correct</em> answer is to use the <a class="reference external" href="http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle">Fisher-Yates shuffle</a> algorithm:</p>
<div class="highlight">
<pre>fisher_yates_shuffle(arr):
  if len(arr) &gt; 1:
    i = len(arr) - 1
    while i &gt; 0:
      s = random from inclusive range [0:i]
      swap arr[s] with arr[i]
      i--
</pre>
</div>
<p>It was first invented as a paper-and-pencil method back in 1938, and later was popularized by Donald Knuth in Volume II of TAOCP. For this reason it&#8217;s also sometimes called the Fisher-Yates-Knuth algorithm. In this article I don&#8217;t aim to compare Fisher-Yates to the naive algorithm. Nor do I plan to explain why the naive shuffle doesn&#8217;t work. Others have done it before me, see the references to Jeff&#8217;s post and the SO discussion above.</p>
<p>What I do plan to do, however, is to explain <em>why</em> the Fisher-Yates algorithm works. To put it more formally, why given a good random-number generator, the Fisher-Yates shuffle produces a uniform shuffle of an array in which every permutation is equally likely. And my plan is not to prove the shuffle&#8217;s correctness mathematically, but rather to explain it intuitively. I personally find it much simpler to remember an algorithm once I understand the intuition behind it.</p>
<div class="section" id="an-analogy">
<h3>An analogy</h3>
<p>Imagine a magician&#8217;s hat:</p>
<div align="center" class="align-center"><img alt="http://eli.thegreenplace.net/wp-content/uploads/2010/05/magician_hat.png" class="align-center" src="http://eli.thegreenplace.net/wp-content/uploads/2010/05/magician_hat.png" /></div>
<p>And a bunch of distinct balls. Let&#8217;s take pool balls for the example:</p>
<div align="center" class="align-center"><img alt="http://eli.thegreenplace.net/wp-content/uploads/2010/05/poolballs.png" class="align-center" src="http://eli.thegreenplace.net/wp-content/uploads/2010/05/poolballs.png" /></div>
<p>Suppose you place all those balls into the hat <a class="footnote-reference" href="#id6" id="id2">[2]</a> and stir them really well. Now, you look away and start taking balls randomly out of the hat and placing them in a line. Assuming the hat stir was random and you can&#8217;t distinguish the balls by touch alone, once the hat is empty, the resulting line is a random permutation of the balls. No ball had a larger chance of being the first in line than any other ball. After that, all the remaining balls in the hat had an equal chance of being the second in line, and so on. Again, this isn&#8217;t a rigorous proof, but the point of this article is intuition.</p>
<p>If you understand why this procedure produces a random shuffle of the balls, you can understand Fisher-Yates, because it is just a variation on the same theme.</p>
</div>
<div class="section" id="the-intuition-behind-fisher-yates-shuffling">
<h3>The intuition behind Fisher-Yates shuffling</h3>
<p>The Fisher-Yates shuffle performs a procedure similar to pulling balls at random from a hat. Here&#8217;s the algorithm once again, this time in my favorite pseudo-code format, Python <a class="footnote-reference" href="#id7" id="id3">[3]</a>:</p>
<div class="highlight">
<pre><span style="color: #00007f; font-weight: bold">def</span> <span style="color: #00007f">fisher_yates</span>(arr):
    <span style="color: #00007f; font-weight: bold">if</span> <span style="color: #00007f">len</span>(arr) &gt; <span style="color: #007f7f">1</span>:
        i = <span style="color: #00007f">len</span>(arr) - <span style="color: #007f7f">1</span>
        <span style="color: #00007f; font-weight: bold">while</span> i &gt; <span style="color: #007f7f">0</span>:
            s = randint(<span style="color: #007f7f">0</span>, i)
            arr[i], arr[s] = arr[s], arr[i]
            i -= <span style="color: #007f7f">1</span>
</pre>
</div>
<p>The trick is doing it in-place with no extra memory. The following illustration step by step should explain what&#8217;s going on. Let&#8217;s start with an array of 4 elements:</p>
<div align="center" class="align-center"><img alt="http://eli.thegreenplace.net/wp-content/uploads/2010/05/initial4.png" class="align-center" src="http://eli.thegreenplace.net/wp-content/uploads/2010/05/initial4.png" /></div>
<p>The array contains the letters <tt class="docutils literal"><span class="pre">a,</span> <span class="pre">b,</span> <span class="pre">c,</span> <span class="pre">d</span></tt> at indices <tt class="docutils literal"><span class="pre">[0:3]</span></tt>. The red arrow shows where <tt class="docutils literal"><span class="pre">i</span></tt> points initially. Now, the initial step in the loop picks a random index in the range <tt class="docutils literal"><span class="pre">[0:i]</span></tt>, which is <tt class="docutils literal"><span class="pre">[0:3]</span></tt> in the first iteration. Suppose the index 1 was picked, and the code swaps element 1 with element 3 (which is the initial <tt class="docutils literal"><span class="pre">i</span></tt>). So after the first iteration the array looks like this:</p>
<div align="center" class="align-center"><img alt="http://eli.thegreenplace.net/wp-content/uploads/2010/05/after1.png" class="align-center" src="http://eli.thegreenplace.net/wp-content/uploads/2010/05/after1.png" /></div>
<p>Notice that I colored the part of the array to the right of <tt class="docutils literal"><span class="pre">i</span></tt> in another color. Here&#8217;s spoiler: <strong>The blue part of the array is the hat, and the orange part is the line where the random permutation is being built</strong>. Let&#8217;s make one more step of the loop. A random number in the range <tt class="docutils literal"><span class="pre">[0:2]</span></tt> has to be picked, so suppose 2 is picked. Therefore, the swap just leaves the element at index 2 in its original place:</p>
<div align="center" class="align-center"><img alt="http://eli.thegreenplace.net/wp-content/uploads/2010/05/after2.png" class="align-center" src="http://eli.thegreenplace.net/wp-content/uploads/2010/05/after2.png" /></div>
<p>We make one more step. Suppose 0 is picked at random from <tt class="docutils literal"><span class="pre">[0:1]</span></tt> so elements at indices 0 and 1 are swapped:</p>
<div align="center" class="align-center"><img alt="http://eli.thegreenplace.net/wp-content/uploads/2010/05/after3.png" class="align-center" src="http://eli.thegreenplace.net/wp-content/uploads/2010/05/after3.png" /></div>
<p>At this point we&#8217;re done. There&#8217;s only one ball left in the hat, so it will be surely picked next. This is why the loop of the algorithm runs <tt class="docutils literal"><span class="pre">while</span> <span class="pre">i</span> <span class="pre">&gt;</span> <span class="pre">0</span></tt> &#8211; once <tt class="docutils literal"><span class="pre">i</span></tt> reaches 0, the algorithm finishes:</p>
<div align="center" class="align-center"><img alt="http://eli.thegreenplace.net/wp-content/uploads/2010/05/final4.png" class="align-center" src="http://eli.thegreenplace.net/wp-content/uploads/2010/05/final4.png" /></div>
<p>So, to understand why the Fisher-Yates shuffling algorithm works, keep in mind the following: the algorithm makes a &quot;virtual&quot; division of the array it shuffles into two parts. The part at indices <tt class="docutils literal"><span class="pre">[0:i]</span></tt> is the <em>hat</em>, from which elements will be picked at random. The part to the right of <tt class="docutils literal"><span class="pre">i</span></tt> (that is, <tt class="docutils literal"><span class="pre">[i+1:len(arr)-1]</span></tt>) is the final line where the random permutation is being formed. In each step of the algorithm, it picks one element from the hat and adds it to the line, removing it from the hat.</p>
<p>Some final notes:</p>
<ul class="simple">
<li>Since all the indices <tt class="docutils literal"><span class="pre">[0:i]</span></tt> are in the hat, the selection can pick <tt class="docutils literal"><span class="pre">i</span></tt> itself. In such case there&#8217;s no real swapping being done, but the element at index <tt class="docutils literal"><span class="pre">i</span></tt> moves from the hat and to the line. Having the selection from range <tt class="docutils literal"><span class="pre">[0:i]</span></tt> is crucial to the correctness of the algorithm. A common implementation mistake is to make this range <tt class="docutils literal"><span class="pre">[0:i-1]</span></tt>, which causes the shuffle to be non-uniform.</li>
<li>The vast majority of implementations you&#8217;ll see online run the algorithm from the end of the array down. But this isn&#8217;t set in stone &#8211; it&#8217;s just a convention. The algorithm will work equally well with <tt class="docutils literal"><span class="pre">i</span></tt> starting at 0 and running until the end of the array, picking items in the range <tt class="docutils literal"><span class="pre">[i:len(arr)-1]</span></tt> at each step.</li>
</ul>
</div>
<div class="section" id="conclusion">
<h3>Conclusion</h3>
<p>Random shuffling is important for many applications. Although it&#8217;s a seemingly simple operation, it&#8217;s easy to do wrong. The Internet is abound with stories of gambling companies losing money because their shuffles weren&#8217;t random enough.</p>
<p>The Fisher-Yates algorithm produces a uniform shuffling of an array. It&#8217;s optimally efficient both in runtime (running in <tt class="docutils literal"><span class="pre">O(len(arr))</span></tt>) and space (the shuffle is done in-place, using only <tt class="docutils literal"><span class="pre">O(1)</span></tt> extra memory).</p>
<p>In this article I aimed to explain the intuition behind the algorithm, firmly believing that a real, deep understanding of something <a class="footnote-reference" href="#id8" id="id4">[4]</a> is both intellectually rewarding and useful.</p>
<div align="center" class="align-center"><img alt="http://eli.thegreenplace.net/wp-content/uploads/hline.jpg" class="align-center" src="http://eli.thegreenplace.net/wp-content/uploads/hline.jpg" style="width: 320px; height: 5px;" /></div>
<table class="docutils footnote" frame="void" id="id5" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id1">[1]</a></td>
<td>Here, like in the rest of the article, assume that all arrays are 0-based, i.e. their first element is at index 0.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id6" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id2">[2]</a></td>
<td>Yes, I know it will become heavy. Actually, if you&#8217;ve noticed it you probably have a slight case of <a class="reference external" href="http://en.wikipedia.org/wiki/Attention-deficit_hyperactivity_disorder">ADHD</a>. Stay <em>focused</em> on the algorithm, OK? If you still can&#8217;t, imagine that these are mini-pool balls.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id7" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id3">[3]</a></td>
<td>This implementation is very similar to the code of <tt class="docutils literal"><span class="pre">random.shuffle</span></tt> from the standard library.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id8" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id4">[4]</a></td>
<td>In other words, <a class="reference external" href="http://en.wikipedia.org/wiki/Grok">grokking it</a>.</td>
</tr>
</tbody>
</table>
</div>
<img src="http://eli.thegreenplace.net/?ak_action=api_record_view&id=2194&type=feed" alt="" />

<p>Related posts:<ol><li><a href='http://eli.thegreenplace.net/2008/08/23/initializing-an-array-in-constant-time/' rel='bookmark' title='Permanent Link: Initializing an array in constant time'>Initializing an array in constant time</a> <small>The problem We want to use a very large array...</small></li></ol></p>]]></content:encoded>
			<wfw:commentRss>http://eli.thegreenplace.net/2010/05/28/the-intuition-behind-fisher-yates-shuffling/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>Migrating my personal projects to Mercurial</title>
		<link>http://eli.thegreenplace.net/2010/05/22/migrating-my-personal-projects-to-mercurial/</link>
		<comments>http://eli.thegreenplace.net/2010/05/22/migrating-my-personal-projects-to-mercurial/#comments</comments>
		<pubDate>Sat, 22 May 2010 05:48:27 +0000</pubDate>
		<dc:creator>eliben</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Software & Tools]]></category>

		<guid isPermaLink="false">http://eli.thegreenplace.net/?p=2191</guid>
		<description><![CDATA[
Introduction
My first acquaintance with version control was soon after the beginning of my professional career, at IBM in 2000. We were using RCS at that time, and later moved to CVS. Three years ago, I started using Subversion for my personal projects at home, and since then I can&#8217;t imagine not having my code safely [...]


Related posts:<ol><li><a href='http://eli.thegreenplace.net/2007/04/14/subversion-repository-on-sourceforge/' rel='bookmark' title='Permanent Link: Subversion repository on SourceForge'>Subversion repository on SourceForge</a> <small>It comes with experience &#8211; the uncomfortable, nagging feeling every...</small></li><li><a href='http://eli.thegreenplace.net/2010/07/23/contributing-to-python/' rel='bookmark' title='Permanent Link: Contributing to Python'>Contributing to Python</a> <small>I&#8217;ve been involved in open-source projects almost since the first...</small></li><li><a href='http://eli.thegreenplace.net/2006/04/12/migrating-to-wordpress/' rel='bookmark' title='Permanent Link: Migrating to Wordpress'>Migrating to Wordpress</a> <small>My blog has just recently migrated from use.perl to Blogger,...</small></li></ol>]]></description>
			<content:encoded><![CDATA[<div class="section" id="introduction">
<h3>Introduction</h3>
<p>My first acquaintance with version control was soon after the beginning of my professional career, at IBM in 2000. We were using RCS at that time, and later moved to CVS. <a class="reference external" href="http://eli.thegreenplace.net/2007/04/14/subversion-repository-on-sourceforge/">Three years ago</a>, I started using Subversion for my personal projects at home, and since then I can&#8217;t imagine not having my code safely tucked in source control for any prolonged amount of time. Lately, the excellent <a class="reference external" href="http://code.google.com/projecthosting/">source code hosting service of Google</a> has been my online repository of choice.</p>
<p>Staying married to a single technology or tool isn&#8217;t a good strategy, however. The world of software advances quickly, and better solutions for old problems get invented all the time. Distributed version control is one such solution. It has gained a lot of popularity in the past few years and is, slowly but surely, taking over the world of source control. In this post I want to show how I discovered that Subversion is no longer good enough for my needs, and began using Mercurial in its place for managing all my personal projects and code.</p>
</div>
<div class="section" id="the-need">
<h3>The need</h3>
<p>This week I was planning to do some self-educational hacking on the source code of Python <a class="footnote-reference" href="#id7" id="id1">[1]</a>, and it occurred to me that I&#8217;m going to have a problem keeping my explorations safe in a source-control system. Here&#8217;s why:</p>
<p>Python has an official Subversion <a class="reference external" href="http://svn.python.org/">repository at python.org</a> &#8211; you can check out a read-only copy from it, but there your benefit from source control ends. Since I don&#8217;t have Python commit rights, my checked-out sandbox is just a local snapshot &#8211; I can&#8217;t create branches or commit my changes anywhere.</p>
<p>What I could do is create a personal SVN repository, import Python into it and play around. But how to keep up with advances in Python itself? Subversion doesn&#8217;t support such merging between two repositories in a convenient way.</p>
<p>Another, unrelated qualm with Subversion came up with my own personal repositories. It&#8217;s not new &#8211; it&#8217;s a sorrow that has been accumulating over a long time. The problem with SVN is that the local copy only contains the latest revision &#8211; it can show you the differences between that and your local changes quickly. For anything else, you must turn to the repository itself over the network. And that&#8217;s really slow.</p>
<p>Unfortunately, high Internet connection speeds aren&#8217;t of much help here. The bandwidth may be sufficient, but latency is the culprit. A simple <tt class="docutils literal"><span class="pre">ping</span></tt> roundtrip to <tt class="docutils literal"><span class="pre">code.google.com</span></tt> from my PC (located in Israel) takes about 100 ms. I&#8217;m sure that the time that it takes Google to dispatch my request to a SVN server, and that server to parse and understand my request isn&#8217;t negligible either. Subversion has a protocol that has to send and receive multiple commands to do simple operations like see the project log, diff between older revisions and so on. These latencies add up, making me constantly stare at a frozen screen. Even a simple and commonly needed operation like viewing the repository log take a few seconds, and diffing old revisions much longer than that. This can quickly become <em>really</em> annoying.</p>
</div>
<div class="section" id="mercurial-is-the-answer">
<h3>Mercurial is the answer</h3>
<div align="center" class="align-center"><img alt="http://eli.thegreenplace.net/wp-content/uploads/2010/05/mercurial-logo-droplets-200.png" class="align-center" src="http://eli.thegreenplace.net/wp-content/uploads/2010/05/mercurial-logo-droplets-200.png" /></div>
<p>As it turns out, the first problem I mentioned bothered the Python core developers quite a bit, so about a year ago they&#8217;ve decided to switch Python itself to Mercurial <a class="footnote-reference" href="#id8" id="id2">[2]</a>. The official repository hasn&#8217;t switched yet, but a <a class="reference external" href="http://code.python.org/hg">Mercurial mirror exists</a>, reflecting everything going on in the SVN repository practically in real time.</p>
<p>This made my decision much easier. A DVCS (Distributed Version Control System) addresses both my needs:</p>
<ol class="arabic simple">
<li>It allows each developer to have a full snapshot of the repository locally. Updates from the official repository are done by <em>pulling</em>, but local changes can be made with full source-control. You only really have to merge when you plan to push into the official repository. This is very convenient for people without commit priveleges, because they can experiment with the source, incrementally tweaking stuff and saving it in the local repository.</li>
<li>By having a local repository, everything becomes fast &#8211; you mostly work with a local copy, and only access the network to push and pull changes. Now I can leisurely explore the history of my project, diffing old revisions, all at the speed of a local hard-drive access.</li>
</ol>
<p>But what about all the disk space? Aren&#8217;t repositories huge? Isn&#8217;t keeping them on every computer wasteful? Far from it, as it turns out. My local Python source directory (the py3k branch, last pulled today) is about 100 MB in size. The repository part (the <tt class="docutils literal"><span class="pre">.hg</span></tt> directory) &#8211; with all the history (thousands of revisions), takes less than half of this space &#8211; about 46 MB. This is due to Mercurial&#8217;s highly optimized storage system, which is both diff-based and efficiently compressed. Is this a high price to pay for all the convenience? Hardly, with a 1 TB hard-drive available for less than $100 these days.</p>
<p>Mercurial has a lot of tricks in its bag when it comes to saving space. If you create a clone of a local repository, Mercurial uses hard links (even on Windows!) to bring its overhead in the new clone to almost 0. Having multiple local clones is convenient if you want to explore a separate line of development in a convenient way, or have both a maintenance branch and a development trunk easily available for your project.</p>
<p>Windows users used to TortoiseSVN won&#8217;t be disappointed &#8211; <a class="reference external" href="http://tortoisehg.bitbucket.org/">TortoiseHg</a> is a similar tool, and it works just as well.</p>
<p>Overall, Mercurial has been quick and fun to learn and start using. When a tool fits your mental model, has the solution of your problem as its goal, and performs its job well, it&#8217;s a smooth, seamless experience. For me, there&#8217;s only one thing left that feels funny, and this is the need to remember to push after I&#8217;ve committed. One of my uses for the online repository is to synchronize the same code between multiple computers. With SVN I got used to just committing on one machine, and have my changes available on the other with a simple update. With Mercurial, a couple more steps are required: after committing I must push, and then at the other machine pull and update <a class="footnote-reference" href="#id10" id="id3">[3]</a>. I&#8217;m confident that this isn&#8217;t a big issue, however, and I&#8217;ll get used to it quickly.</p>
</div>
<div class="section" id="why-mercurial-and-not-another-dvcs">
<h3>Why Mercurial and not another DVCS</h3>
<p>This question just had to surface, and the Python devs have struggled with the same dilemma. They&#8217;ve actually done most of the work with a great comparison of the options in <a class="reference external" href="http://www.python.org/dev/peps/pep-0374/">PEP 374</a>, so all I have left is to reiterate their conclusions:</p>
<ul class="simple">
<li>I prefer a Python-based system because, well&#8230; because I like Python! It&#8217;s fun reading about Mercurial&#8217;s internals and then being able to peruse the Python source code that implements it. So this throws Git out of the window <a class="footnote-reference" href="#id11" id="id4">[4]</a>.</li>
<li>As for Bazaar, I don&#8217;t have a strong preference so I go with the crowd. Mercurial is more popular. It&#8217;s used by huge projects like Mozilla, Vim, XEmacs, and Python. The last, in particular, seals the deal. If I want to hack on Python, Mercurial is the natural choice.</li>
</ul>
</div>
<div class="section" id="resources">
<h3>Resources</h3>
<p>Here are some resources I&#8217;ve found very useful in the transition, in no particular order:</p>
<ul class="simple">
<li><a class="reference external" href="http://hginit.com/">Hg Init</a>: an amazing Mercurial tutorial by Joel Spolsky. Highly recommended, to understand both the <em>how</em> and the <em>why</em> of Mercurial.</li>
<li><a class="reference external" href="http://hgbook.red-bean.com/read/">Mercurial: The Definitive Guide</a>: A complete book, available freely online</li>
<li>Python PEPs <a class="reference external" href="http://www.python.org/dev/peps/pep-0374/">374</a> and <a class="reference external" href="http://www.python.org/dev/peps/pep-0385/">385</a></li>
<li><a class="reference external" href="http://www.selenic.com/mercurial/hgrc.5.html">hgrc</a>: Documents the Mercurial configuration file</li>
<li>Official <a class="reference external" href="http://mercurial.selenic.com/wiki/FAQ">Mercurial FAQ</a></li>
<li><a class="reference external" href="http://code.google.com/p/support/wiki/MercurialFAQ">The Google project hosting Mercurial FAQ</a></li>
</ul>
<div align="center" class="align-center"><img alt="http://eli.thegreenplace.net/wp-content/uploads/hline.jpg" class="align-center" src="http://eli.thegreenplace.net/wp-content/uploads/hline.jpg" style="width: 320px; height: 5px;" /></div>
<table class="docutils footnote" frame="void" id="id7" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id1">[1]</a></td>
<td>More specifically CPython, the &quot;official&quot; implementation.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id8" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id2">[2]</a></td>
<td>The reasons for the switch, with various considerations of the competing SCMs is described in detail in <a class="reference external" href="http://www.python.org/dev/peps/pep-0374/">PEP 374</a>.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id10" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id3">[3]</a></td>
<td>Pulling and updating can be done in a single step by issuing <tt class="docutils literal"><span class="pre">hg</span> <span class="pre">pull</span> <span class="pre">-u</span></tt>.</td>
</tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="id11" rules="none">
<colgroup>
<col class="label" />
<col /></colgroup>
<tbody valign="top">
<tr>
<td class="label"><a class="fn-backref" href="#id4">[4]</a></td>
<td>For the sake of fairness I must note that the C source code of Git is pretty good. I dug into it a while ago <a class="reference external" href="http://eli.thegreenplace.net/2009/10/30/handling-out-of-memory-conditions-in-c/">for other purposes</a> and was pleased by its readability and overall quality.</td>
</tr>
</tbody>
</table>
</div>
<img src="http://eli.thegreenplace.net/?ak_action=api_record_view&id=2191&type=feed" alt="" />

<p>Related posts:<ol><li><a href='http://eli.thegreenplace.net/2007/04/14/subversion-repository-on-sourceforge/' rel='bookmark' title='Permanent Link: Subversion repository on SourceForge'>Subversion repository on SourceForge</a> <small>It comes with experience &#8211; the uncomfortable, nagging feeling every...</small></li><li><a href='http://eli.thegreenplace.net/2010/07/23/contributing-to-python/' rel='bookmark' title='Permanent Link: Contributing to Python'>Contributing to Python</a> <small>I&#8217;ve been involved in open-source projects almost since the first...</small></li><li><a href='http://eli.thegreenplace.net/2006/04/12/migrating-to-wordpress/' rel='bookmark' title='Permanent Link: Migrating to Wordpress'>Migrating to Wordpress</a> <small>My blog has just recently migrated from use.perl to Blogger,...</small></li></ol></p>]]></content:encoded>
			<wfw:commentRss>http://eli.thegreenplace.net/2010/05/22/migrating-my-personal-projects-to-mercurial/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Making code compatible with Python 2 and 3</title>
		<link>http://eli.thegreenplace.net/2010/05/19/making-code-compatible-with-python-2-and-3/</link>
		<comments>http://eli.thegreenplace.net/2010/05/19/making-code-compatible-with-python-2-and-3/#comments</comments>
		<pubDate>Wed, 19 May 2010 06:51:17 +0000</pubDate>
		<dc:creator>eliben</dc:creator>
				<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://eli.thegreenplace.net/?p=2186</guid>
		<description><![CDATA[Update: Thanks for the great comments! To new readers of this post &#8211; make sure to skim the comments after you finish reading. There is some great advice there for making the change simpler &#8211; especially when you need to be compatible only with 2.6 and not the earlier versions (2.6 was especially designed to [...]


Related posts:<ol><li><a href='http://eli.thegreenplace.net/2009/11/28/python-internals-working-with-python-asts/' rel='bookmark' title='Permanent Link: Python internals: Working with Python ASTs'>Python internals: Working with Python ASTs</a> <small> Starting with Python 2.5, the Python compiler (the part...</small></li><li><a href='http://eli.thegreenplace.net/2008/06/27/creating-python-extension-modules-in-c/' rel='bookmark' title='Permanent Link: Creating Python extension modules in C'>Creating Python extension modules in C</a> <small>I&#8217;ve successfully created a C extension for Python, basically following...</small></li><li><a href='http://eli.thegreenplace.net/2008/08/31/ctypes-calling-cc-code-from-python/' rel='bookmark' title='Permanent Link: ctypes &#8211; calling C/C++ code from Python'>ctypes &#8211; calling C/C++ code from Python</a> <small> Introduction A couple of years ago, I wrote about...</small></li></ol>]]></description>
			<content:encoded><![CDATA[<p><em><strong>Update:</strong> Thanks for the great comments! To new readers of this post &#8211; make sure to skim the comments after you finish reading. There is some great advice there for making the change simpler &#8211; especially when you need to be compatible only with 2.6 and not the earlier versions (2.6 was especially designed to make future transition to 3K simpler).</em></p>
<p>Python 3 has been available for a long time already, but the migration of modules to it is going slower than many Python afficionados would have hoped. Once code is ported to Py3k, it cannot run on 2.x. This is the reason many library authors are afraid to make the step and port their code &#8211; they rightfully refuse to maintain two code bases. So we have a &quot;lack of critical mass&quot; problem.</p>
<p>In my opinion, to make the migration easier, it makes sense to write code that can run on both Python 2 and 3, at least for some time. Yes, this can make some parts of the code a bit ugly (although most of it can be hidden) but it will allow porting without actually having to maintain two code-bases. Once the critical mass assembles, the compatibility to 2.x can be dropped.</p>
<p>To contribute my share to the effort, I&#8217;ve successfully transformed two of my major code-bases to run on both Python 2.6 and 3.1:</p>
<ul class="simple">
<li><a class="reference external" href="http://code.google.com/p/pycparser/">pycparser</a> &#8211; the ANSI C parser in pure Python: the new version (1.07) can run on both versions of Python (other than that, it isn&#8217;t different from 1.06)</li>
<li><a class="reference external" href="http://code.google.com/p/luz-cpu/">Luz</a> &#8211; the assembler/linker/CPU simulator suite has also been ported.</li>
</ul>
<p>This porting was easier than I hoped. Since this is the first time I&#8217;ve touched Python 3, I had to use a few resources for help in the transition. Some of the best ones: <a class="reference external" href="http://diveintopython3.org/porting-code-to-python-3-with-2to3.html">Dive into Python</a>, <a class="reference external" href="http://www.rmi.net/~lutz/lp3e-updates-notes-python.html">Mark Lutz&#8217;s site</a> and <a class="reference external" href="http://nedbatchelder.com/blog/200910/running_the_same_code_on_python_2x_and_3x.html">Ned&#8217;s post</a></p>
<p>Here&#8217;s a list of some tricks I had to use, in no particular order. First and foremost, I created a <tt class="docutils literal"><span class="pre">portability.py</span></tt> file too encapsulate the differences as much as possible. Sometimes I had to use the following check:</p>
<div class="highlight">
<pre><span style="color: #00007f; font-weight: bold">if</span> sys.hexversion &gt; <span style="color: #007f7f">0</span>x03000000
</pre>
</div>
<p>To differentiate between Python versions. Luckily, all such checks could be confined to <tt class="docutils literal"><span class="pre">portability.py</span></tt>.</p>
<p>Here&#8217;s an example of a couple of functions from <tt class="docutils literal"><span class="pre">portability.py</span></tt>:</p>
<div class="highlight">
<pre><span style="color: #00007f; font-weight: bold">def</span> <span style="color: #00007f">printme</span>(s):
    sys.stdout.write(<span style="color: #00007f">str</span>(s))

<span style="color: #00007f; font-weight: bold">def</span> <span style="color: #00007f">get_input</span>(prompt):
    <span style="color: #00007f; font-weight: bold">if</span> sys.hexversion &gt; <span style="color: #007f7f">0</span>x03000000:
        <span style="color: #00007f; font-weight: bold">return</span> <span style="color: #00007f">input</span>(prompt)
    <span style="color: #00007f; font-weight: bold">else</span>:
        <span style="color: #00007f; font-weight: bold">return</span> <span style="color: #00007f">raw_input</span>(prompt)
</pre>
</div>
<p>Python 3 made <tt class="docutils literal"><span class="pre">print</span></tt> into a function, so as a statement it doesn&#8217;t even parse. <tt class="docutils literal"><span class="pre">printme</span></tt> is a function which can be called by both versions of Python. It&#8217;s not as versatile as <tt class="docutils literal"><span class="pre">print</span></tt> itself, but it&#8217;s a small trouble since I mostly used <tt class="docutils literal"><span class="pre">print</span></tt> for debugging, testing and some trivial output.</p>
<p><tt class="docutils literal"><span class="pre">get_input</span></tt> encapsulates the lack of <tt class="docutils literal"><span class="pre">raw_input</span></tt> in Python 3.</p>
<p>Another problem I commonly had to tackle is catching exceptions. Since the syntax was changed in Python 3, I had to resort to this for portability:</p>
<div class="highlight">
<pre><span style="color: #00007f; font-weight: bold">except</span> TypeError:
    err = sys.exc_info()[<span style="color: #007f7f">1</span>]
</pre>
</div>
<p>This code runs in both versions and places the exception message in <tt class="docutils literal"><span class="pre">err</span></tt>.</p>
<p>Some differences were very easy to handle. For example Python 3 removed <tt class="docutils literal"><span class="pre">xrange</span></tt>, so I&#8217;ve just used <tt class="docutils literal"><span class="pre">list(range</span></tt>. Had performance really mattered, I would have had to use something more complex. Also, <tt class="docutils literal"><span class="pre">itertools.imap</span></tt> was removed so I replaced it with <tt class="docutils literal"><span class="pre">iter(map</span></tt>. Dictionaries lost their <tt class="docutils literal"><span class="pre">has_key</span></tt> member, but <tt class="docutils literal"><span class="pre">key</span> <span class="pre">in</span> <span class="pre">dict</span></tt> works well on both versions of Python, so this is another easy change.</p>
<p>Luz is a relatively large project, sub-divided into packages and many modules, so relative vs. absolute imports gave me some trouble. Luckily, the 2.x version I wanted to be compatible with is 2.6, so I could just use relative imports everywhere and it works well on both versions.</p>
<p>The full-test running capabilities in Luz gave me some trouble because I&#8217;m using dynamic Python code loading there. The <tt class="docutils literal"><span class="pre">new</span></tt> module disappeared in Python 3, but happily <tt class="docutils literal"><span class="pre">imp.new_module</span></tt> replaces it and works in 2.6 as well. Also, I had to use a trick borrowed from <a class="reference external" href="http://nedbatchelder.com/blog/200910/running_the_same_code_on_python_2x_and_3x.html">Ned</a> to replace <tt class="docutils literal"><span class="pre">exec</span></tt> with this monstrosity:</p>
<div class="highlight">
<pre><span style="color: #007f00"># Borrowed from Ned Batchelder</span>
<span style="color: #00007f; font-weight: bold">if</span> sys.hexversion &gt; <span style="color: #007f7f">0</span>x03000000:
    <span style="color: #00007f; font-weight: bold">def</span> <span style="color: #00007f">exec_function</span>(source, filename, global_map):
        <span style="color: #00007f; font-weight: bold">exec</span>(<span style="color: #00007f">compile</span>(source, filename, <span style="color: #7f007f">&quot;exec&quot;</span>), global_map)
<span style="color: #00007f; font-weight: bold">else</span>:
    <span style="color: #00007f">eval</span>(<span style="color: #00007f">compile</span>(<span style="color: #7f007f">&quot;&quot;&quot;\</span>
<span style="color: #7f007f">def exec_function(source, filename, global_map):</span>
<span style="color: #7f007f">    exec compile(source, filename, &quot;exec&quot;) in global_map</span>
<span style="color: #7f007f">&quot;&quot;&quot;</span>,
    <span style="color: #7f007f">&quot;&lt;exec_function&gt;&quot;</span>, <span style="color: #7f007f">&quot;exec&quot;</span>))
</pre>
</div>
<p>Just like catching exceptions, since <tt class="docutils literal"><span class="pre">exec</span></tt> is <em>syntax</em>, you just can&#8217;t nicely hide it behind a version check. The parser chokes on it even if that code section doesn&#8217;t get executed eventually. Therefore, a brute-force approach using <tt class="docutils literal"><span class="pre">eval(compile</span></tt> is called for, since this one runs at runtime, when only the relevant interpreter sees it.</p>
<p>That&#8217;s about it. From now on I plan to keep both <tt class="docutils literal"><span class="pre">pycparser</span></tt> and Luz functional on both versions of Python &#8211; it shouldn&#8217;t be too hard. In the future when I feel the time is right to make the switch to Py3k, it will be trivial &#8211; I&#8217;ll just clean-up all the ugly portability code.</p>
<p><strong>P.S.:</strong> To complete such a task you really need good unit tests. I can&#8217;t imagine making it and staying sane without the extensive tests both code-bases have.</p>
<img src="http://eli.thegreenplace.net/?ak_action=api_record_view&id=2186&type=feed" alt="" />

<p>Related posts:<ol><li><a href='http://eli.thegreenplace.net/2009/11/28/python-internals-working-with-python-asts/' rel='bookmark' title='Permanent Link: Python internals: Working with Python ASTs'>Python internals: Working with Python ASTs</a> <small> Starting with Python 2.5, the Python compiler (the part...</small></li><li><a href='http://eli.thegreenplace.net/2008/06/27/creating-python-extension-modules-in-c/' rel='bookmark' title='Permanent Link: Creating Python extension modules in C'>Creating Python extension modules in C</a> <small>I&#8217;ve successfully created a C extension for Python, basically following...</small></li><li><a href='http://eli.thegreenplace.net/2008/08/31/ctypes-calling-cc-code-from-python/' rel='bookmark' title='Permanent Link: ctypes &#8211; calling C/C++ code from Python'>ctypes &#8211; calling C/C++ code from Python</a> <small> Introduction A couple of years ago, I wrote about...</small></li></ol></p>]]></content:encoded>
			<wfw:commentRss>http://eli.thegreenplace.net/2010/05/19/making-code-compatible-with-python-2-and-3/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>scons instead of make</title>
		<link>http://eli.thegreenplace.net/2010/05/14/scons-instead-of-make/</link>
		<comments>http://eli.thegreenplace.net/2010/05/14/scons-instead-of-make/#comments</comments>
		<pubDate>Fri, 14 May 2010 05:11:04 +0000</pubDate>
		<dc:creator>eliben</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://eli.thegreenplace.net/?p=2177</guid>
		<description><![CDATA[I&#8217;ve always disliked make. Recently, I decided to give scons a try for compiling C and C++ projects. So far I like it &#8211; although I&#8217;m not yet familiar with all of it, it seems like a much nicer way to achieve the same results. 
I&#8217;m not sure how popular scons is now &#8211; its [...]


Related posts:<ol><li><a href='http://eli.thegreenplace.net/2008/08/02/rant-about-mailing-lists/' rel='bookmark' title='Permanent Link: rant about mailing lists'>rant about mailing lists</a> <small>This post is a rant. Take it with a grain...</small></li></ol>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve always disliked <code>make</code>. Recently, I decided to give <code>scons</code> a try for compiling C and C++ projects. So far I like it &#8211; although I&#8217;m not yet familiar with all of it, it seems like a much nicer way to achieve the same results. </p>
<p>I&#8217;m not sure how popular <code>scons</code> is now &#8211; its mailing list seems half-deserted. Maybe there are better options these days? Anyway, so far I&#8217;m happy I&#8217;m using it instead of <code>make</code>.</p>
<img src="http://eli.thegreenplace.net/?ak_action=api_record_view&id=2177&type=feed" alt="" />

<p>Related posts:<ol><li><a href='http://eli.thegreenplace.net/2008/08/02/rant-about-mailing-lists/' rel='bookmark' title='Permanent Link: rant about mailing lists'>rant about mailing lists</a> <small>This post is a rant. Take it with a grain...</small></li></ol></p>]]></content:encoded>
			<wfw:commentRss>http://eli.thegreenplace.net/2010/05/14/scons-instead-of-make/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Introducing Luz</title>
		<link>http://eli.thegreenplace.net/2010/05/05/introducing-luz/</link>
		<comments>http://eli.thegreenplace.net/2010/05/05/introducing-luz/#comments</comments>
		<pubDate>Wed, 05 May 2010 17:43:38 +0000</pubDate>
		<dc:creator>eliben</dc:creator>
				<category><![CDATA[Assembly]]></category>
		<category><![CDATA[EE / Embedded]]></category>

		<guid isPermaLink="false">http://eli.thegreenplace.net/?p=2164</guid>
		<description><![CDATA[OK, so the documentation still isn&#8217;t complete, but I can&#8217;t wait to introduce my newest concoction &#8211; Luz. Luz is a pure-Python implementation of a MIPS-like CPU (as a simulator, of course). This CPU is programmable in an assembly language, a complete assembler for which has been implemented, along with a linker that takes together [...]


Related posts:<ol><li><a href='http://eli.thegreenplace.net/2009/03/13/python-documentation-annoyance/' rel='bookmark' title='Permanent Link: Python documentation annoyance'>Python documentation annoyance</a> <small>Edit: I&#8217;ve actually started working on fixing this annoyance in...</small></li><li><a href='http://eli.thegreenplace.net/2005/02/20/mix-implementation-in-perl-completed/' rel='bookmark' title='Permanent Link: MIX implementation in Perl completed !'>MIX implementation in Perl completed !</a> <small>I&#8217;ve recently completed my Perl implementation of Knuth&#8217;s MIX/MIXAL (Perlmix)....</small></li></ol>]]></description>
			<content:encoded><![CDATA[<p>OK, so the documentation still isn&#8217;t complete, but I can&#8217;t wait to introduce my newest concoction &#8211; <a href="http://code.google.com/p/luz-cpu/">Luz</a>. Luz is a pure-Python implementation of a MIPS-like CPU (as a simulator, of course). This CPU is programmable in an assembly language, a complete assembler for which has been implemented, along with a linker that takes together several object files and creates an executable image to run on the simulator. Oh, and did I mention that it also includes a rudimentary debugger and disassembler? All of this is Luz:</p>
<p><img src="http://eli.thegreenplace.net/wp-content/uploads/2010/05/luz_proj_toplevel.png" alt="" title="luz_proj_toplevel" width="437" height="952" class="aligncenter size-full wp-image-2165" /></p>
<p>To call Luz new is a bit of a stretch, because I started working on it more than two years ago. It has been a jagged road, with occasional spurts of productivity, but now Luz is finally in a presentable form.</p>
<p>I&#8217;ll paste from its &#8220;getting started guide&#8221;:</p>
<blockquote><p>
<strong>What is Luz useful for?</strong></p>
<p>I don&#8217;t know yet. It&#8217;s a self-educational project of mine, and I learned a lot by working on it. I suppose that Luz&#8217;s main value is as an educational tool. Its implementation focuses on simplicity and modularity, and is done in Python, which is a portable and very readable high-level language.</p>
<p>Luz can serve as a sample of implementing a complete assembler, a complete linker, a complete CPU simulator. Other such tools exist, but usually not in the clean and self-contained form offered by Luz. In any case, if you&#8217;ve found Luz iseful, I&#8217;d love to receive feedback.</p></blockquote>
<p>This summarizes it, really. Not much more to add, except that Luz is available from its <a href="http://code.google.com/p/luz-cpu/">Google Code project</a> in source-only form for now, so you&#8217;ll have to check it out from SVN or just look at the sources in the online browser. Checking the source out is recommended because it allows one to view the documentation in nice HTML format. A few example programs in Luz assembly are available. Luz requires Python 2.6 or higher and the PLY module installed. I tested it on Windows XP and Ubuntu.</p>
<p>I&#8217;ve written <a href="http://eli.thegreenplace.net/2005/02/20/mix-implementation-in-perl-completed/">an assembler and a CPU simulator before</a>, but that was for a very weird architecture (Knuth&#8217;s MIX from TAOCP). Luz is a much more useful beast &#8211; the CPU is not far from real modern CPUs (the embedded kind, mostly), the assembly language is familiar and best of all, Luz also includes a linker, which will make it much easier to compile C for it in the future.</p>
<p>I&#8217;ll write more about Luz in sometime later, when I find the time to work on its documentation.</p>
<img src="http://eli.thegreenplace.net/?ak_action=api_record_view&id=2164&type=feed" alt="" />

<p>Related posts:<ol><li><a href='http://eli.thegreenplace.net/2009/03/13/python-documentation-annoyance/' rel='bookmark' title='Permanent Link: Python documentation annoyance'>Python documentation annoyance</a> <small>Edit: I&#8217;ve actually started working on fixing this annoyance in...</small></li><li><a href='http://eli.thegreenplace.net/2005/02/20/mix-implementation-in-perl-completed/' rel='bookmark' title='Permanent Link: MIX implementation in Perl completed !'>MIX implementation in Perl completed !</a> <small>I&#8217;ve recently completed my Perl implementation of Knuth&#8217;s MIX/MIXAL (Perlmix)....</small></li></ol></p>]]></content:encoded>
			<wfw:commentRss>http://eli.thegreenplace.net/2010/05/05/introducing-luz/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>pycparser v1.06 released</title>
		<link>http://eli.thegreenplace.net/2010/04/10/pycparser-v1-06-released/</link>
		<comments>http://eli.thegreenplace.net/2010/04/10/pycparser-v1-06-released/#comments</comments>
		<pubDate>Sat, 10 Apr 2010 13:46:58 +0000</pubDate>
		<dc:creator>eliben</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://eli.thegreenplace.net/?p=2152</guid>
		<description><![CDATA[I&#8217;ve just released version 1.06 of my pycparser project (a complete pure-Python parser for the ANSI C language). It&#8217;s not a major release &#8211; a few minor bugs were fixed and the compatibility of unit tests and examples with Linux was improved.  
In the past few months people contacted me with questions or just [...]


Related posts:<ol><li><a href='http://eli.thegreenplace.net/2008/11/15/pycparser-v10-is-out/' rel='bookmark' title='Permanent Link: pycparser v1.0 is out!'>pycparser v1.0 is out!</a> <small>I&#8217;m happy to announce that I&#8217;ve finally released version 1.0...</small></li><li><a href='http://eli.thegreenplace.net/2009/05/22/faking-standard-c-header-files-for-pycparser/' rel='bookmark' title='Permanent Link: Faking standard C header files for pycparser'>Faking standard C header files for pycparser</a> <small>My Python-based parser and AST generator for ANSI C &#8211;...</small></li><li><a href='http://eli.thegreenplace.net/2008/10/18/implementing-cdecl-with-pycparser/' rel='bookmark' title='Permanent Link: Implementing cdecl with pycparser'>Implementing cdecl with pycparser</a> <small>cdecl is a tool for decoding C type declarations. It...</small></li></ol>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve just released version 1.06 of my <a href="http://code.google.com/p/pycparser/">pycparser project</a> (a complete pure-Python parser for the ANSI C language). It&#8217;s not a major release &#8211; a few minor bugs were fixed and the compatibility of unit tests and examples with Linux was improved.  </p>
<p>In the past few months people contacted me with questions or just feedback about <code>pycparser</code>, and told me about their uses for the library. It&#8217;s great to hear how many cool projects are utilizing it. Some examples:</p>
<ul>
<li>C code obfuscator</li>
<li>Front-end for various specialized C compilers</li>
<li>Static code checker</li>
<li>Automatic unit-test discovery</li>
<li>Adding specialized extensions to the C language</li>
</ul>
<p>If you&#8217;re using <code>pycparser</code>, I would really love to hear about your experience with it. </p>
<img src="http://eli.thegreenplace.net/?ak_action=api_record_view&id=2152&type=feed" alt="" />

<p>Related posts:<ol><li><a href='http://eli.thegreenplace.net/2008/11/15/pycparser-v10-is-out/' rel='bookmark' title='Permanent Link: pycparser v1.0 is out!'>pycparser v1.0 is out!</a> <small>I&#8217;m happy to announce that I&#8217;ve finally released version 1.0...</small></li><li><a href='http://eli.thegreenplace.net/2009/05/22/faking-standard-c-header-files-for-pycparser/' rel='bookmark' title='Permanent Link: Faking standard C header files for pycparser'>Faking standard C header files for pycparser</a> <small>My Python-based parser and AST generator for ANSI C &#8211;...</small></li><li><a href='http://eli.thegreenplace.net/2008/10/18/implementing-cdecl-with-pycparser/' rel='bookmark' title='Permanent Link: Implementing cdecl with pycparser'>Implementing cdecl with pycparser</a> <small>cdecl is a tool for decoding C type declarations. It...</small></li></ol></p>]]></content:encoded>
			<wfw:commentRss>http://eli.thegreenplace.net/2010/04/10/pycparser-v1-06-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
