Welcome


... to my place on the web - a personal website where I keep articles, freeware programs and code snippets I've written, and a weblog where I share my ideas and insights on programming, technology, books, and life in general.

To browse the website, use the site map on the right-hand side of the screen. Below, you will find the most recent posts.

Keeping persistent history in bash

June 11th, 2013 at 7:27 pm

For someone spending most of his time in front of a Linux terminal, history is very important. But traditional bash history has a number of limitations, especially when multiple terminals are involved (I sometimes have dozens open). Also it’s not very good at preserving just the history you’re interested in across reboots.

There are many approaches to improve the situation; here I want to discuss one I’ve been using very successfully in the past few months – a simple "persistent history" that keeps track of history across terminal instances, saving it into a dot-file in my home directory (~/.persistent_history). All commands, from all terminal instances, are saved there, forever. I found this tremendously useful in my work – it saves me time almost every day.

Why does it go into a separate history and not the main one which is accessible by all the existing history manipulation tools? Because IMHO the latter is still worthwhile to be kept separate for the simple need of bringing up recent commands in a single terminal, without mixing up commands from other terminals. While the terminal is open, I want the press "Up" and get the previous command, even if I’ve executed a 1000 other commands in other terminal instances in the meantime.

Persistent history is very easy to set up. Here’s the relevant portion of my ~/.bashrc:

log_bash_persistent_history()
{
  [[
    $(history 1) =~ ^\ *[0-9]+\ +([^\ ]+\ [^\ ]+)\ +(.*)$
  ]]
  local date_part="${BASH_REMATCH[1]}"
  local command_part="${BASH_REMATCH[2]}"
  if [ "$command_part" != "$PERSISTENT_HISTORY_LAST" ]
  then
    echo $date_part "|" "$command_part" >> ~/.persistent_history
    export PERSISTENT_HISTORY_LAST="$command_part"
  fi
}

# Stuff to do on PROMPT_COMMAND
run_on_prompt_command()
{
    log_bash_persistent_history
}

PROMPT_COMMAND="run_on_prompt_command"

The format of the history file created by this is:

2013-06-09 17:48:11 | cat ~/.persistent_history
2013-06-09 17:49:17 | vi /home/eliben/.bashrc
2013-06-09 17:49:23 | ls

Note that an environment variable is used to avoid useless duplication (i.e. if I run ls twenty times in a row, it will only be recorded once).

OK, so we have ~/.persistent_history, how do we use it? First, I should say that it’s not used very often, which kind of connects to the point I made earlier about separating it from the much higher-use regular command history. Sometimes I just look into the file with vi or tail, but mostly this alias does the trick for me:

alias phgrep='cat ~/.persistent_history|grep --color'

The alias name mirrors another alias I’ve been using for ages:

alias hgrep='history|grep --color'

Another tool for managing persistent history is a trimmer. I said earlier this file keeps the history "forever", which is a scary word – what if it grows too large? Well, first of all – worry not. At work my history file grew to about 2 MB after 3 months of heavy usage, and 2 MB is pretty small these days. Appending to the end of a file is very, very quick (I’m pretty sure it’s a constant-time operation) so the size doesn’t matter much. But trimming is easy:

tail -20000 ~/.persistent_history | tee ~/.persistent_history

Trims to the last 20000 lines. This should be sufficient for at least a couple of months of history, and your workflow should not really rely on more than that :-)

Finally, what’s the use of having a tool like this without employing it to collect some useless statistics. Here’s a histogram of the 15 most common commands I’ve used on my home machine’s terminal over the past 3 months:

ls        : 865
vi        : 863
hg        : 741
cd        : 512
ll        : 289
pss       : 245
hst       : 200
python    : 168
make      : 167
git       : 148
time      : 94
python3   : 88
./python  : 88
hpu       : 82
cat       : 80

Some explanation: hst is an alias for hg st. hpu is an alias for hg pull -u. pss is my awesome pss tool, and is the reason why you don’t see any calls to grep and find in the list. The proportion of Mercurial vs. git commands is likely to change in the very near future due to this.

Switching my open-source projects from Bitbucket to Github

June 9th, 2013 at 4:49 pm

http://eli.thegreenplace.net/wp-content/uploads/2013/06/Octocat-300x249.jpg

I’m switching my public open-source projects from Bitbucket to Github, and in the process also from Mercurial to git. But the switch has little to do with git vs. Mercurial; it’s mostly about Github winning the platform war vs. Bitbucket. I like Mercurial, and it was a natural choice when switching from SVN a few years back. Back then I wasn’t familiar with git, and I am quite a bit more familiar with it now, but still the actual SCM played little role in the decision.

Well, to be precise there’s one thing I find slightly more convenient about git. I really like its throwaway-branch mode of development. I want to be able to create quick local branches for hacking, throw some away, merge others and keep my history clean. I want to commit every single comma if I feel like it, without worrying about polluting the "official" history. So git merge --squash or squashing with interactive rebasing are workflows I appreciate. It’s not that these aren’t possible with Mercurial, which recently gave up a bit on the pedanticism and allows similar workflows via extensions. But it’s not the natural or the familiar way of working with Mercurial. As a proof of that, I kept receiving pull requests with dozens of useless small commits just to implement some feature, and kept asking the contributors to find a way to send me a single commit, or just leave the pull request machinery behind and send an old’n good patch file.

But I’m getting sidetracked. Github is just way, way more popular these days, especially for open source projects. I was very disappointed seeing many contributors say they’re reluctant to contribute because that would require creating a Bitbucket account and fork my projects there. Would I please just switch to Github? sigh… I can understand that – Github managed to give coding a nice social aspect – your Github profile is part of your online "rep". You want your contributions to other projects to be seen through it, so people were feeling that having a fork on Bitbucket is like this hidden place no one will ever see and attribute to them.

A curious anecdote of the relative popularity is something I noticed when I started doing the switch. I had more followers on Github than on Bitbucket! Oh my, even though I had a number of moderately popular open source projects I was furiously hacking on Bitbucket, vs. a bunch of half-neglected forks and hacks on Github.

So here it is; not an overly coherent set of thoughts, I’ll admit, but I hope it makes sense. I don’t have anything against Mercurial or Bitbucket – I’m still a user of both. But the higher-profile open-source projects are now on Github. Happy hacking.

How require loads modules in Node.js

May 27th, 2013 at 9:22 am

One of the useful tools Node.js adds on top of standard ECMAScript is a notation for defining and using modules. A "module" exports objects and functions by adding them to exports, and another module can import it by using require. The semantics are explained well in the official documentation.

While the documentation does a good job describing how require finds the module to import, it doesn’t say much about how the importing itself happens, and how the exports and module objects are magically visible and usable in the module’s code. Here I want to provide a lower-level view of this missing link, gleaned from the source code of Node.js v0.10.8 (lib/module.js).

As the documentation linked above explains, there are a few places modules can be found in by require. There’s also a number of different imports require can perform – from folders, from JSON files, from compiled Node modules (C++), and so on. Here I’ll focus on importing from a regular JavaScript source file (.js).

Code from .js files is simply read into a string. Next, the code string is wrapped with a function:

(function (exports, require, module, __filename, __dirname) {
   // <-- MODULE CONTENTS HERE -->
});

These wrapped contents are evaluated by the JavaScript VM, and the result is a function object; let’s call it module_func. This function is invoked as follows:

module_func.apply(module.exports,
                  [module.exports, require, module, filename, dirname]);

And the return value of require is then module.exports.

What’s module? It’s the special module object the require mechanism has built for loading our module. It’s actually described quite well in the modules documentation I mentioned before. That page says:

In each module, the module free variable is a reference to the object representing the current module. In particular module.exports is accessible via the exports module-global. module isn’t actually a global but rather local to each module.

Now it should be obvious how this comes to be. The wrapper function created for our code has the arguments module and exports, which become visible in the code. The apply invocation above shows what they get bound to. What it also does is set this in the global scope of our code to the exports property of the module. So the following are all equivalent ways to add stuff to a module’s exports:

exports.say_hi = function () {console.log('hi');}
module.exports.say_bye = function () {console.log('bye');}
this.say_farewell = function () {console.log('farewell');}

The last way look suspicious. We know that by default, variables in the code don’t get exported. In other words, in:

var foo = 1;
exports.bar = 2;

While bar is exported to require, foo is not. But how can this be, if this is bound to exports? Doesn’t var foo = 1 add foo to this, being the global object?

This is only puzzling if you think of your code as stand-alone JavaScript, in which the global scope is truly global. But recall, from a few paragraphs above, that our code is wrapped in a function. So the "global" scope in the module is actually function scope. In function scope, variables don’t get auto-bound to this. Mystery solved.

One final note: Node.js has a few useful flags you can set as environment variables for debugging. In particular, I’ve found setting:

$ NODE_DEBUG=module node <file.js>

Very handy for following through the module loading process require does.

Python will have enums in 3.4!

May 10th, 2013 at 6:06 am

After months of intensive discussion (more than a 1000 emails in dozens of threads spread over two mailing lists, and a couple of hundred additional private emails), PEP 435 has been accepted and Python will finally have an enumeration type in 3.4!

The discussion and decision process has been long and arduous, but eventually very positive. A collective brain is certainly better than any single one; the final proposal is better in a number of ways than the initial one, and the vast majority of Python core developers now feel good about it (give or take a couple of very minor issues).

I’ve been told enums have been debated on Python development lists for years. There’s at least one earlier PEP (354) that’s been discussed and rejected in 2005.

I think part of the success of the current attempt can be attributed to the advances in metaclasses that has been made in the past few releases (3.x). These advances allow very nice syntax of enum definitions that provides a lot of convenient features for free. I tried to find interesting examples of metaclasses in the standard library in 2011; Once the enum gets pushed I’ll have a much better example :-)

Ten years of blogging

May 6th, 2013 at 5:32 am

It has occurred to me that the first post in this blog was 10 years ago, so I decided to dig up a bit of history. 10 years is a long time! As the post linked above shows, I even called it "weblog" then. The term "blog" seems to have been coined in 1999, but probably wasn’t popular enough in 2003 yet. That post also closes with a hope that the blog will last; it sure did.

In these 10 years the blog went from an obscure journal on use.perl.org that only a few people knew about to 66,000 unique visitors with half a million page views a month on average (stats for end-of 2012).

While it looks very different from the initial "weblog", it actually went through relatively few incarnations for its age. Starting somewhere in 2005 I became unhappy with use.perl.org and eventually moved the blog to Blogger. But Blogger itself was quite limited for serious programming-related blogging at the time, so it took me only two weeks to figure out it’s not quite what I wanted and move to a WordPress-based blog on my own domain and shared hosting (Bluehost). In the seven years that has passed since, the blog changed very little visually. I stretched the main viewing area bit by bit as high-resolution monitors became more common, installed a couple of minor plugins, but that’s it.

What did change was the quality of content. As I wrote here, the blog became much more technical with time, leaving the shorter and more personal posts to outlets like G+. The amount of technical content is also the reason the blog’s popularity grew rapidly in the past few years. Apparently more people want to know how debuggers work and about the internals of Python and Linux loaders than about some random cool library I found or how I spent the last week at work being bored. Go figure.

So what’s next?

Dedicated readers will notice that the amount of in-depth posts has slowed down in the past few months. This was to be expected, having moved to a different continent and started a much more demanding job. But I still hope to continue posting articles from time to time. This blog is far from being done! As for infrastructure, I was actually pondering to make a large change after yet another break-your-site WordPress upgrade. But I still haven’t found time to do this. Maybe at some point I will.

So thanks for reading, and wishing myself at least another 10 years of blogging. It has been an incredibly positive experience for me in many respects, and I have full intention of keeping it up as long as I can.