Eli Bendersky's website - Javascripthttps://eli.thegreenplace.net/2023-10-24T03:57:52-07:00ES Module imports in Node.js and the browser2023-10-23T20:58:00-07:002023-10-24T03:57:52-07:00Eli Benderskytag:eli.thegreenplace.net,2023-10-23:/2023/es-module-imports-in-nodejs-and-the-browser/<p>For a <a class="reference external" href="https://eli.thegreenplace.net/2023/cubic-spline-interpolation/">recent project</a>, I wanted to
have some JS code (in multiple files) available for testing from the command-line
with Node.js, but also to be able to load the same code into a web page to be
invoked directly from a browser.</p>
<p>I've encountered this same issue before …</p><p>For a <a class="reference external" href="https://eli.thegreenplace.net/2023/cubic-spline-interpolation/">recent project</a>, I wanted to
have some JS code (in multiple files) available for testing from the command-line
with Node.js, but also to be able to load the same code into a web page to be
invoked directly from a browser.</p>
<p>I've encountered this same issue before for my <a class="reference external" href="https://eliben.org/js8080/">in-browser 8080 assembler and
simulator project</a>, and used a combination of
CommonJS <tt class="docutils literal">require</tt>s with a <a class="reference external" href="https://browserify.org/">bundler tool</a> to make it work. But
we're in 2023 now, and CommonJS is supposed to be phasing out. So my goal for
the new project was to do this using ES modules (ESM) and without any separate
tooling.</p>
<p>Let's see how it works.</p>
<div class="section" id="project-structure">
<h2>Project structure</h2>
<p>Here's the structure of the <a class="reference external" href="https://github.com/eliben/code-for-blog/tree/master/2023/js-gauss-spline">js-gauss-spline</a> project
serving as our demo:</p>
<div class="highlight"><pre><span></span>$ tree
.
├── eqsolve.js
├── package.json
├── plot.html
├── README.md
├── spline.js
└── test
└── test.js
</pre></div>
<p>The files <tt class="docutils literal">eqsolve.js</tt> and <tt class="docutils literal">spline.js</tt> implement the functionality we want
to both test on the command-line and import in the browser.
Their functionality is exposed via <tt class="docutils literal">export</tt>ed functions.</p>
</div>
<div class="section" id="testing-in-node-js">
<h2>Testing in Node.js</h2>
<p>The test code lives in <tt class="docutils literal">test/test.js</tt>, and it starts like this:</p>
<div class="highlight"><pre><span></span><span class="k">import</span><span class="w"> </span><span class="nx">assert</span><span class="w"> </span><span class="kr">from</span><span class="w"> </span><span class="s1">'node:assert/strict'</span><span class="p">;</span><span class="w"></span>
<span class="k">import</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">solve</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="kr">from</span><span class="w"> </span><span class="s1">'../eqsolve.js'</span><span class="p">;</span><span class="w"></span>
<span class="k">import</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">buildSplineEquations</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="kr">from</span><span class="w"> </span><span class="s1">'../spline.js'</span><span class="p">;</span><span class="w"></span>
</pre></div>
<p>The exported functions we want to test are imported with relative paths. The
file also imports Node's built-in <tt class="docutils literal">assert</tt> functionality; it uses <tt class="docutils literal">assert</tt>s
directly, without any unit-testing framework.</p>
<p>To run the tests, simply invoke <tt class="docutils literal">node</tt> (which has to be a recent version
that properly supports ES modules):</p>
<div class="highlight"><pre><span></span>$ node --version
v20.5.0
$ node test/test.js
success
</pre></div>
</div>
<div class="section" id="running-in-the-browser">
<h2>Running in the browser</h2>
<p>So far so good. Now let's import these files into a web application running in the
browser; in our project, the main entry point is <tt class="docutils literal">plot.html</tt>. It has more
custom JS code, along with whatever HTML elements are needed. Here's the
JS part that imports functions from <tt class="docutils literal">eqsolve.js</tt> and <tt class="docutils literal">spline.js</tt>:</p>
<div class="highlight"><pre><span></span><span class="p"><</span><span class="nt">script</span> <span class="na">type</span><span class="o">=</span><span class="s">"module"</span><span class="p">></span><span class="w"></span>
<span class="w"> </span><span class="k">import</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="nx">d3</span><span class="w"> </span><span class="kr">from</span><span class="w"> </span><span class="s2">"https://cdn.jsdelivr.net/npm/d3@7/+esm"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">import</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">buildSplineEquations</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="kr">from</span><span class="w"> </span><span class="s2">"./spline.js"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">import</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">solve</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="kr">from</span><span class="w"> </span><span class="s2">"./eqsolve.js"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="c1">// ... more web-app JS code here</span><span class="w"></span>
<span class="p"></</span><span class="nt">script</span><span class="p">></span>
</pre></div>
<p><a class="reference external" href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Modules">According to MDN</a>, browsers have supported
ES modules for quite a while now, so if you have a reasonably recent browser it should
support <tt class="docutils literal">script <span class="pre">type="module"</span></tt> and <tt class="docutils literal">import</tt> statements.</p>
<p>Note that our code here imports an additional JS library using ESM - <tt class="docutils literal">d3</tt>,
directly from a URL.</p>
<p>When testing this web page locally, opening it as a file with the <tt class="docutils literal"><span class="pre">file:///</span></tt>
scheme won't work; you'll get CORS errors in the browser console, because
<tt class="docutils literal">import</tt>ing local files from the page is not supported. We'll need to serve
the directory locally using a file server.</p>
<p>Luckily this is very easy to do with my <a class="reference external" href="https://github.com/eliben/static-server/">static-server project</a> <a class="footnote-reference" href="#footnote-1" id="footnote-reference-1">[1]</a>:</p>
<div class="highlight"><pre><span></span>$ static-server .
2023/10/21 07:07:44.168573 Serving directory "." on http://127.0.0.1:8080
</pre></div>
<p>With this server running, opening <a class="reference external" href="http://127.0.0.1:8080/plot.html">http://127.0.0.1:8080/plot.html</a> should
successfully load everything.</p>
<p>Direct support of JS imports in the browser is a big step forward for the
ecosystem; now it's easy to properly structure non-trivial web applications
without requiring a separate build step with external tooling.</p>
<hr class="docutils" />
<table class="docutils footnote" frame="void" id="footnote-1" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#footnote-reference-1">[1]</a></td><td>If you don't have <tt class="docutils literal">go</tt> installed,
the NPM <a class="reference external" href="https://www.npmjs.com/package/http-server">http-server</a>
package will work just as well.</td></tr>
</tbody>
</table>
</div>
Cubic spline interpolation2023-10-12T05:57:00-07:002023-10-24T03:57:52-07:00Eli Benderskytag:eli.thegreenplace.net,2023-10-12:/2023/cubic-spline-interpolation/<p>This post explains how cubic spline interpolation works, and presents a full
implementation in JavaScript, hooked up to a SVG-based visualization.
As a side effect, it also covers Gaussian elimination and presents a JavaScript
implementation of that as well.</p>
<p>I love topics that mix math and programming in a meaningful …</p><p>This post explains how cubic spline interpolation works, and presents a full
implementation in JavaScript, hooked up to a SVG-based visualization.
As a side effect, it also covers Gaussian elimination and presents a JavaScript
implementation of that as well.</p>
<p>I love topics that mix math and programming in a meaningful way, and cubic
spline interpolation is an excellent example of such a topic. There's a bunch
of linear algebra here and some calculus, all connected with code to create
a useful tool.</p>
<div class="section" id="motivation">
<h2>Motivation</h2>
<p>In an <em>interpolation</em> problem, we're given a set of points (we'll be using
2D points <em>X,Y</em> throughout this post) and are asked to estimate Y values for
Xs not in this original set, specifically for Xs that lie between Xs of the
original set (estimation for Xs outside the bounds of the original set
is called <em>extrapolation</em>).</p>
<p>As a concrete example, consider the set of points (0, 1), (1, 3), (2, 2); here
they are plotted in the usual coordinate system:</p>
<img alt="Three points on a 2D plot" class="align-center" src="https://eli.thegreenplace.net/images/2023/interp-3points.png" />
<p>Interpolation is estimating the value of Y for Xs between 0 and 2, given just
this data set. Obviously, the more complex the underlying function/phenomenon,
and the fewer original points we're given, interpolation becomes more difficult
to do accurately.</p>
<p>There are many techniques to interpolate between a given set of points.
<a class="reference external" href="https://en.wikipedia.org/wiki/Polynomial_interpolation">Polynomial interpolation</a> can perfectly fit N
points with an N-1 degree polynomial, but this approach can be problematic for
large a N; high-degree polynomials tend to overfit their data, and suffer from
other numerical issues like <a class="reference external" href="https://en.wikipedia.org/wiki/Runge's_phenomenon">Runge's phenomenon</a>.</p>
<p>Instead of interpolating all the points with a single function, a very popular
alternative is using <a class="reference external" href="https://en.wikipedia.org/wiki/Spline_(mathematics)">Splines</a>, which are piece-wise
polynomials. The idea is to fit a low-degree polynomial between every pair of
adjacent points in the original data set; for N points, we get N-1 different
polynomials. The simplest and best known variant of this technique is linear
interpolation:</p>
<img alt="Three points on a 2D plot with linear interpolation connecting them" class="align-center" src="https://eli.thegreenplace.net/images/2023/interp-linear.png" />
<p>Linear interpolation has clear benefits: it's very fast, and when N is large
it produces reasonable results. However, for small Ns the result isn't great,
and the approximation is very crude. Here's the linear spline interpolation of
the <a class="reference external" href="https://en.wikipedia.org/wiki/Sinc_function">Sinc function</a> sampled
at 7 points:</p>
<img alt="Sinc function with linear interpolation" class="align-center" src="https://eli.thegreenplace.net/images/2023/interp-sinc-linear.png" />
<p>We can certainly do much better.</p>
<p>How about higher-degree splines? We can try second degree polynomials, but it's
better to jump straight to cubic (third degree). Here's why: to make our
interpolation realistic and aesthetically pleasing, we want the neighboring
polynomials not only to touch at the original points (the linear splines already
do this), but to actually look like they're part of the same curve. For this
purpose, we want the <em>slope</em> of the polynomials to be continuous, meaning that
if two polynomials meet at point P, their first derivatives at this point are
equal. Moreover, to ensure smoothness and to minimize needless bending <a class="footnote-reference" href="#footnote-1" id="footnote-reference-1">[1]</a>, we
also want the second derivatives of the two polynomials to be equal at P. The
lowest degree of polynomial that gives us this level of control is 3 (since the
second derivative of a quadratic polynomial is constant); hence cubic splines.</p>
<p>Here's a cubic spline interpolating between the three points of the original
example:</p>
<img alt="Three points on a 2D plot with cubic spline interpolation connecting them" class="align-center" src="https://eli.thegreenplace.net/images/2023/interp-cubic.png" />
<p>And the <em>Sinc</em> function:</p>
<img alt="Sinc function with cubic spline interpolation connecting them" class="align-center" src="https://eli.thegreenplace.net/images/2023/interp-sinc-cubic.png" />
<p>Because of the continuity of first and second derivatives, cubic splines look
very natural; on the other hand, since the degree of each polynomial remains
at most 3, they don't overfit too much. Hence they're such a popular tool for
interpolation and design/graphics.</p>
<p>All the plots in this post have been produced by <a class="reference external" href="https://github.com/eliben/code-for-blog/tree/master/2023/js-gauss-spline">JavaScript code</a>
that implements cubic spline interpolation from scratch. Let's move on to learn
how it works.</p>
</div>
<div class="section" id="setting-up-equations-for-cubic-spline-interpolation">
<h2>Setting up equations for cubic spline interpolation</h2>
<p>Given a set of N points, we want to produce N-1 cubic polynomials between these
points. While these are distinct polynomials, they are connected through mutual
constraints on the original points, as we'll see soon.</p>
<p>More formally, we're going to define N-1 polynomials in the inclusive range
<object class="valign-m5" data="https://eli.thegreenplace.net/images/math/9b9b2d7bb5787b4e4c6c897fdedd72630c5bb16a.svg" style="height: 19px;" type="image/svg+xml">i \in\{0 ...N-2\}</object>:</p>
<object class="align-center" data="https://eli.thegreenplace.net/images/math/2ac0b93cd5715f7041d230d1d057cd4797bdb458.svg" style="height: 22px;" type="image/svg+xml">\[p_i(x)=a_ix^3+b_ix^2+c_ix+d_i\]</object>
<p>For each polynomial, we have to find 4 coefficients: <em>a</em>, <em>b</em>, <em>c</em> and <em>d</em>;
in total, for N-1 polynomials we'll need 4N-4 coefficients. We're going to
find these coefficients by expressing the constraints we have as linear
equations, and then solving a system of linear equations. We'll need 4N-4
equations to ensure we can find a unique solution for 4N-4 unknowns.</p>
<p>Let's use our sample set of three original points to demonstrate how this
calculation works: (0, 1), (1, 3), (2, 2). Since N is 3, we'll be looking for
two polynomials and a total of 8 coefficients.</p>
<p>The first set of constraints is obvious - each polynomial has to pass through
the two points it's interpolating between. The first polynomial passes through
the points (0, 1) and (1, 3), so we can write the equations:</p>
<object class="align-center" data="https://eli.thegreenplace.net/images/math/00090cc844b4a8b06c6ed084b0a814a86c13c148.svg" style="height: 46px;" type="image/svg+xml">\[\begin{align*}
p_0(0)&=0a_0 + 0b_0 + 0c_0 + d_0=1\\
p_0(1)&=a_0+b_0+c_0+d_0=3
\end{align*}\]</object>
<p>The second polynomial passes through the points (1, 3) and (2, 2), resulting
in the equations:</p>
<object class="align-center" data="https://eli.thegreenplace.net/images/math/20edf7b504d79094c9fdcabe836664bf10b42fc6.svg" style="height: 46px;" type="image/svg+xml">\[\begin{align*}
p_1(1)&=a_1+b_1+c_1+d_1=3\\
p_1(2)&=8a_1 + 4b_1 + 2c_1 + d_1=2
\end{align*}\]</object>
<p>We have 4 equations, and need 4 more.</p>
<p>We constrain the first and second derivatives of the polynomials to be equal at
the points where they meet. In our example, there are only two polynomials that
meet at a single point, so we'll get two equations: their derivatives are equal
at point (1, 3).</p>
<p>Recall that the first and second derivatives of a cubic polynomial are:</p>
<object class="align-center" data="https://eli.thegreenplace.net/images/math/d26d85730cd1985330ab76d27cb48bb03a9278ae.svg" style="height: 48px;" type="image/svg+xml">\[\begin{align*}
p_i'(x)&=3a_ix^2+2b_ix+c_i\\
p_i''(x)&=6a_ix+2b_i
\end{align*}\]</object>
<p>The equation we get from equating the first derivatives is:</p>
<object class="align-center" data="https://eli.thegreenplace.net/images/math/49c0538e2adf8ce5aae6a0e1fcfb48e9d8ccb946.svg" style="height: 21px;" type="image/svg+xml">\[p_0'(1)=3a_0+2b_0+c_0=p_1'(1)=3a_1+2b_1+c_1\]</object>
<p>Or, expressed as a linear equation of all coefficients:</p>
<object class="align-center" data="https://eli.thegreenplace.net/images/math/d6d310b5eebfe69c6b3adc5f8fb5c33758077afe.svg" style="height: 14px;" type="image/svg+xml">\[3a_0+2b_0+c_0-3a_1-2b_1-c_1=0\]</object>
<p>Similarly, the equation we get from equating the second derivatives is:</p>
<object class="align-center" data="https://eli.thegreenplace.net/images/math/a7803f84d767481ca8cc77b667b511592722d149.svg" style="height: 21px;" type="image/svg+xml">\[p_0''(1)=6a_0+2=p_1''(1)=6a_1+2\]</object>
<p>Expressed as a linear equation of all coefficients:</p>
<object class="align-center" data="https://eli.thegreenplace.net/images/math/1adf5f1b5b352afa02fab154f8f14488c580cfb9.svg" style="height: 14px;" type="image/svg+xml">\[6a_0+2-6a_1-2=0\]</object>
<p>This brings us to a total of 6 equations. The last two equations will come from
<em>boundary conditions</em>. Notice that - so far - we didn't say much about how our
interpolating polynomials behave at the end points, except that they pass
through them. Boundary conditions are constraints we create to define how our
polynomials behave at these end points.
There are several approaches to this,
but here we'll just discuss the most commonly-used one: a <em>natural</em> spline.
Mathematically it says that the first polynomial has a second derivative of 0
at the first original point, and the last polynomial has a second derivative of
0 at the last original point. In our example:</p>
<object class="align-center" data="https://eli.thegreenplace.net/images/math/378d9e0421bc85fe06ade8ad93ffd620e47274db.svg" style="height: 48px;" type="image/svg+xml">\[\begin{align*}
p_0''(0)=0\\
p_1''(2)=0
\end{align*}\]</object>
<p>Substituting the second derivative equations:</p>
<object class="align-center" data="https://eli.thegreenplace.net/images/math/6cc066ad9122be577dacad8a36b8e7ae1ce387b3.svg" style="height: 48px;" type="image/svg+xml">\[\begin{align*}
p_0''(0)&=2b_0=0\\
p_1''(2)&=12a_1+2b_1=0
\end{align*}\]</object>
<p>We have 8 equations now:</p>
<object class="align-center" data="https://eli.thegreenplace.net/images/math/a9200a1d69c62aa7c22ce23b2d29d3f4f891f22b.svg" style="height: 203px;" type="image/svg+xml">\[\begin{align*}
d_0&=1\\
a_0+b_0+c_0+d_0&=3\\
a_1+b_1+c_1+d_1&=3\\
8a_1 + 4b_1 + 2c_1 + d_1&=2\\
3a_0+2b_0+c_0-3a_1-2b_1-c_1&=0\\
6a_0+2-6a_1-2&=0\\
2b_0&=0\\
12a_1+2b_1&=0
\end{align*}\]</object>
<p>To restate the obvious - while our example only uses 2 polynomials, this
approach generalizes to any number. For N original points, we'll interpolate
with N-1 polynomials, resulting in 4N-4 coefficients. We'll get:</p>
<ul class="simple">
<li>2N-2 equations from setting the points these polynomials pass through</li>
<li>N-2 equations from equating first derivatives at internal points</li>
<li>N-2 equations from equating second derivatives at internal points</li>
<li>2 equations from boundary conditions</li>
</ul>
<p>For a total of 4N-4 equations.</p>
<p>The code that constructs these equations from a given set of points is available
<a class="reference external" href="https://github.com/eliben/code-for-blog/blob/master/2023/js-gauss-spline/spline.js">in this file</a>.</p>
</div>
<div class="section" id="solving-the-equations">
<h2>Solving the equations</h2>
<p>We now have 8 equations with 8 variables. Some of them are trivial, so it's
tempting to just solve the system by hand, and indeed one can do it very easily.
In the general case, however, it would be quite difficult - imagine
interpolating 10 polynomials resulting in 36 equations!</p>
<p>Fortunately, the full power of linear algebra is now at our disposal. We can
express this set of linear equations as a matrix multiplication problem
<object class="valign-0" data="https://eli.thegreenplace.net/images/math/e7d3683a610f89a991289fc2c2c64ba38eb6a004.svg" style="height: 13px;" type="image/svg+xml">Ax=b</object>, where <em>A</em> is a matrix of coefficients, <em>x</em> is a vector of
unknowns and <em>b</em> is the vector of right-hand side constants:</p>
<object class="align-center" data="https://eli.thegreenplace.net/images/math/5243a06db4c849ae927dff870e2db1f3ab2c217d.svg" style="height: 170px;" type="image/svg+xml">\[Ax=b\Rightarrow \begin{pmatrix}
0 & 0 & 0 & 1 & 0 & 0 & 0 & 0\\
1 & 1 & 1 & 1 & 0 & 0 & 0 & 0\\
0 & 0 & 0 & 0 & 1 & 1 & 1 & 1\\
0 & 0 & 0 & 0 & 8 & 4 & 2 & 1\\
3 & 2 & 1 & 0 & -3 & -2 & -1 & 0\\
6 & 2 & 0 & 0 & -6 & -2 & 0 & 0\\
0 & 2 & 0 & 0 & 0 & 0 & 0 & 0\\
0 & 0 & 0 & 0 & 12 & 2 & 0 & 0\\
\end{pmatrix}\begin{pmatrix}
a_0 \\
b_0 \\
c_0 \\
d_0 \\
a_1 \\
b_1 \\
c_1 \\
d_1\end{pmatrix}=\begin{pmatrix}
1\\
3\\
3\\
2\\
0\\
0\\
0\\
0
\end{pmatrix}\]</object>
<p>Solving this system is straightforward using <a class="reference external" href="https://en.wikipedia.org/wiki/Gaussian_elimination">Gaussian elimination</a>.
<a class="reference external" href="https://github.com/eliben/code-for-blog/blob/master/2023/js-gauss-spline/eqsolve.js">Our JavaScript implementation</a>
does this in a few steps:</p>
<ul class="simple">
<li>Performs Gaussian elimination to bring <em>A</em> into row-echelon form, using the
<a class="reference external" href="https://en.wikipedia.org/wiki/Gaussian_elimination#Pseudocode">algorithm outlined on Wikipedia</a>. This
approach tries to preserve numerical stability by selecting the row with the
largest (in absolute value) value for each column <a class="footnote-reference" href="#footnote-2" id="footnote-reference-2">[2]</a>.</li>
<li>Further transforms the resulting matrix into <em>reduced</em> row-echelon form
(a.k.a. Gauss-Jordan elimination)</li>
<li>Extracts the solution.</li>
</ul>
<p>In our example, the solution ends up being the vector (-0.75, 0, 2.75, 1, 0.75,
-4.5, 7.25, -0.5); therefore, the interpolating polynomials are:</p>
<object class="align-center" data="https://eli.thegreenplace.net/images/math/04c71afa0281eb99662ce3d99d4c4546ef08af89.svg" style="height: 50px;" type="image/svg+xml">\[\begin{align*}
p_0(x)&=-0.75x^3+2.75x+1\\
p_1(x)&=0.75x^3-4.5x^2+7.25x-0.5
\end{align*}\]</object>
</div>
<div class="section" id="performing-the-interpolation-itself">
<h2>Performing the interpolation itself</h2>
<p>Now that we have the interpolating polynomials, we can generate any number of
interpolated points. For all <em>x</em> between 0 and 1 we use <object class="valign-m5" data="https://eli.thegreenplace.net/images/math/ad5cb52cf88277ad5a1880722c8ae8b3a6edfd42.svg" style="height: 19px;" type="image/svg+xml">p_0(x)</object>,
and for <em>x</em> between 1 and 2 we use <object class="valign-m5" data="https://eli.thegreenplace.net/images/math/7c2338e3575da884f060665a36a3503d970957a5.svg" style="height: 19px;" type="image/svg+xml">p_1(x)</object>. In our JavaScript
code this is done by the <tt class="docutils literal">doInterpolate</tt> function. We've already seen
the result:</p>
<img alt="Three points on a 2D plot with cubic spline interpolation connecting them" class="align-center" src="https://eli.thegreenplace.net/images/2023/interp-cubic.png" />
</div>
<div class="section" id="code">
<h2>Code</h2>
<p>The complete code sample for this post <a class="reference external" href="https://github.com/eliben/code-for-blog/tree/master/2023/js-gauss-spline">is available on GitHub</a>.
It includes functions for constructing equations for cubic splines from an
original set of points, code for solving linear equations with Gauss-Jordan
elimination, and a demo HTML page that plots the points and linear/spline
interpolations.</p>
<p>The code is readable, heavily-commented JavaScript with no dependencies (except
D3 for the plotting).</p>
<p>An additional demo that uses similar functionality is <a class="reference external" href="https://eliben.github.io/line-plotting/">line-plotting</a>; it plots arbitrary mathematical
functions with optional interpolation (when the number of sampled points is
low).</p>
<hr class="docutils" />
<table class="docutils footnote" frame="void" id="footnote-1" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#footnote-reference-1">[1]</a></td><td>This requirement actually has neat historical roots. In the days before
computers, "splines" were elastic rulers engineers and drafters would
use to interpolate between points by hand. These rulers would bend and
connect at the original points, and it was considered best practice to
minimize bending.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="footnote-2" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#footnote-reference-2">[2]</a></td><td>This helps avoid division by very small numbers, which may cause issues
when using finite-precision floating point.</td></tr>
</tbody>
</table>
</div>
Demystifying Tupper's formula2023-05-22T19:45:00-07:002023-06-03T22:14:53-07:00Eli Benderskytag:eli.thegreenplace.net,2023-05-22:/2023/demystifying-tuppers-formula/<p>A <a class="reference external" href="https://makeanddo4d.com/">book I was recently reading</a> mentioned a
mathematical curiosity I haven't seen before - <a class="reference external" href="https://en.wikipedia.org/wiki/Tupper%27s_self-referential_formula">Tupper's self-referential
formula</a>.
There are some resources about it online, but this post is <em>my</em> attempt to
explain how it works - along with an interactive implementation you can try
in the browser.</p>
<div class="section" id="tupper-s-formula">
<h2>Tupper's formula</h2>
<p>Here is …</p></div><p>A <a class="reference external" href="https://makeanddo4d.com/">book I was recently reading</a> mentioned a
mathematical curiosity I haven't seen before - <a class="reference external" href="https://en.wikipedia.org/wiki/Tupper%27s_self-referential_formula">Tupper's self-referential
formula</a>.
There are some resources about it online, but this post is <em>my</em> attempt to
explain how it works - along with an interactive implementation you can try
in the browser.</p>
<div class="section" id="tupper-s-formula">
<h2>Tupper's formula</h2>
<p>Here is the formula:</p>
<object class="align-center" data="https://eli.thegreenplace.net/images/math/a4f48d2debe2ad234574a9cf2a0c2d1b327963c7.svg" style="height: 36px;" type="image/svg+xml">\[\frac{1}{2}< \left \lfloor mod\left ( \left \lfloor \frac{y}{17}\right \rfloor 2^{-17\lfloor x \rfloor - mod(\lfloor y \rfloor, 17)}, 2 \right ) \right \rfloor\]</object>
<p>We want to plot this formula, but how?</p>
<p>For this purpose, it's more useful to think of Tupper's formula not as a
function but as a <em>relation</em>, in the mathematical sense. In Tupper's paper
this is a relation on <img alt="\mathbb{R}" class="valign-0" src="https://eli.thegreenplace.net/images/math/0ed839b111fe0e3ca2b2f618b940893eaea88a57.png" style="height: 12px;" />, meaning that it's a set of pairs
in <object class="valign-0" data="https://eli.thegreenplace.net/images/math/6d731263787f024f927178eb8fc44f5e91a79bde.svg" style="height: 12px;" type="image/svg+xml">\mathbb{R} \times \mathbb{R}</object> that satisfy the inequality.</p>
<p>For our task we'll use discrete indices for <em>x</em> and <em>y</em>, so the relation is
on <object class="valign-0" data="https://eli.thegreenplace.net/images/math/536c886d7863df5a4e250a73547be5d968c290c7.svg" style="height: 12px;" type="image/svg+xml">\mathbb{N}</object>. We'll plot the relation by using a dark pixel (or
square) for a <tt class="docutils literal">x,y</tt> coordinate where the inequality holds and a light pixel
for a coordinate where it doesn't hold.</p>
<p>The "mind-blowing" fact about Tupper's formula is that when plotted for
a certain range of <em>x</em> and <em>y</em>, it produces this:</p>
<img alt="Tupper's formula own plot" class="align-center" src="https://eli.thegreenplace.net/images/2023/tupper-plot.png" />
<p>Note that while <em>x</em> runs in the inclusive range of 0-105 on the plot, <em>y</em> starts
at a mysterious <em>K</em> and ends at <em>K+16</em>. For the plot above, <em>K</em> needs to be:</p>
<div class="highlight"><pre><span></span>4858450636189713423582095962494202044581400587983244549483
0930850619347047088099284506447698655243648499972470249151
1911041160573917740785691975432657185544205721044573588368
1829823754139634338225199452191651284348332905131193199953
5024137587652392648746133949068701305622958132194811136853
3953556529085002387509285689269455597428154638651073004910
6723058933586052544096664351265349363643957125565695936815
1843348576052669401612512669514215505395545191537854575257
5659074054015792900176596796548006442782913148854825991472
1248506352686630476300
</pre></div>
<p>The amazement subsides slightly when we discover that for a different <em>K</em> <a class="footnote-reference" href="#footnote-1" id="footnote-reference-1">[1]</a>,
we get a different plot:</p>
<img alt="Tupper's formula producing a pacman plot" class="align-center" src="https://eli.thegreenplace.net/images/2023/tupper-pacman.png" />
<p>And, in fact, this formula can produce any 2D grid of 106x17 pixels, given the
right coordinates. Since the formula itself is so simple, it is quite apparent
that the value of <em>K</em> is the key here; these are huge numbers with hundreds of
digits, so clearly they encode the image information somehow. Read on to see
how this actually works.</p>
</div>
<div class="section" id="a-javascript-demo">
<h2>A JavaScript demo</h2>
<p>I've implemented a simple online demo of plotting the Tupper formula - available
at <a class="reference external" href="https://eliben.github.io/tupperformula/">https://eliben.github.io/tupperformula/</a> (with <a class="reference external" href="https://github.com/eliben/tupperformula">source code on GitHub</a>). It was used to produce the images
shown above. The code is fairly straightforward, so I'll just focus on the
interesting part.</p>
<p>The core of the code is a 2D grid that's plotted for <em>x</em> running from 0 to
105 and <em>y</em> from <em>K</em> to <em>K+16</em> (both ranges inclusive). The grid is populated
every time the number changes:</p>
<div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">GridWidth</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">106</span><span class="p">;</span><span class="w"></span>
<span class="kd">const</span><span class="w"> </span><span class="nx">GridHeight</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">17</span><span class="p">;</span><span class="w"></span>
<span class="kd">let</span><span class="w"> </span><span class="nx">K</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">BigInt</span><span class="p">(</span><span class="nx">Knum</span><span class="p">.</span><span class="nx">value</span><span class="p">);</span><span class="w"></span>
<span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">x</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="nx">GridWidth</span><span class="p">;</span><span class="w"> </span><span class="nx">x</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">y</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="nx">GridHeight</span><span class="p">;</span><span class="w"> </span><span class="nx">y</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">Grid</span><span class="p">.</span><span class="nx">setCell</span><span class="p">(</span><span class="nx">x</span><span class="p">,</span><span class="w"> </span><span class="nx">y</span><span class="p">,</span><span class="w"> </span><span class="nx">tupperFormula</span><span class="p">(</span><span class="nb">BigInt</span><span class="p">(</span><span class="nx">x</span><span class="p">),</span><span class="w"> </span><span class="nx">K</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">BigInt</span><span class="p">(</span><span class="nx">y</span><span class="p">)));</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</pre></div>
<p>Note the use of JavaScript's <tt class="docutils literal">BigInt</tt> types here - very handy when dealing
with such huge numbers. Here is <tt class="docutils literal">tupperFormula</tt>:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">tupperFormula</span><span class="p">(</span><span class="nx">x</span><span class="p">,</span><span class="w"> </span><span class="nx">y</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">d</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="nx">y</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="mi">17n</span><span class="p">)</span><span class="w"> </span><span class="o">>></span><span class="w"> </span><span class="p">(</span><span class="mi">17n</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">x</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">y</span><span class="w"> </span><span class="o">%</span><span class="w"> </span><span class="mi">17n</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">d</span><span class="w"> </span><span class="o">%</span><span class="w"> </span><span class="mi">2n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">1n</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</pre></div>
<p>It looks quite different from the mathematical formula at the top of this post;
why? Because - as mentioned before - while Tupper's original formula works on
real numbers, our program only needs the discrete integer range of
<tt class="docutils literal">x in [0, 105]</tt> and <tt class="docutils literal">y in [K, K+16]</tt>. When we deal with discrete numbers,
the formula can be simplified greatly.</p>
<p>Let's start with the original formula and simplify it step by step:</p>
<object class="align-center" data="https://eli.thegreenplace.net/images/math/a4f48d2debe2ad234574a9cf2a0c2d1b327963c7.svg" style="height: 36px;" type="image/svg+xml">\[\frac{1}{2}< \left \lfloor mod\left ( \left \lfloor \frac{y}{17}\right \rfloor 2^{-17\lfloor x \rfloor - mod(\lfloor y \rfloor, 17)}, 2 \right ) \right \rfloor\]</object>
<p>First of all, since <em>x</em> and <em>y</em> are natural numbers, the floor operations on
them don't do anything, so we can drop them (including on the division by
17, if we just assume integer division that rounds down by default):</p>
<object class="align-center" data="https://eli.thegreenplace.net/images/math/f8e9f298cf8e514787979eaa4d7cc6b2b2489cb0.svg" style="height: 36px;" type="image/svg+xml">\[\frac{1}{2}< \left \lfloor mod\left ( \left ( \frac{y}{17}\right ) 2^{-17x - mod(y, 17)}, 2 \right ) \right \rfloor\]</object>
<p>Next, since the result of the <object class="valign-m5" data="https://eli.thegreenplace.net/images/math/6bfbbf950c2eba80fdd316385a8c430702ef839f.svg" style="height: 19px;" type="image/svg+xml">mod(N,2)</object> operation for a natural <em>N</em> is
either 0 or 1, the comparison to half is just a fancy way of saying "equals 1";
we can replace the inequality by:</p>
<object class="align-center" data="https://eli.thegreenplace.net/images/math/b0e581251daff5ef47433e7e7c50bfc94ad4051a.svg" style="height: 33px;" type="image/svg+xml">\[mod\left ( \left ( \frac{y}{17}\right ) 2^{-17x - mod(y, 17)}, 2 \right )=1\]</object>
<p>Note the negative power of 2; multiplying by it is the same as dividing by its
positive counterpart. Another way to express division by <object class="valign-0" data="https://eli.thegreenplace.net/images/math/339f03051f685e4ffbec605928020a75cc9c05d1.svg" style="height: 12px;" type="image/svg+xml">2^p</object> for natural
numbers is a bit shift right by <em>p</em> bits. So we get the code of the
<tt class="docutils literal">tupperFormula</tt> function shown above:</p>
<div class="highlight"><pre><span></span><span class="kd">let</span><span class="w"> </span><span class="nx">d</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="nx">y</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="mi">17n</span><span class="p">)</span><span class="w"> </span><span class="o">>></span><span class="w"> </span><span class="p">(</span><span class="mi">17n</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">x</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">y</span><span class="w"> </span><span class="o">%</span><span class="w"> </span><span class="mi">17n</span><span class="p">);</span><span class="w"></span>
<span class="k">return</span><span class="w"> </span><span class="nx">d</span><span class="w"> </span><span class="o">%</span><span class="w"> </span><span class="mi">2n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">1n</span><span class="p">;</span><span class="w"></span>
</pre></div>
</div>
<div class="section" id="how-the-tupper-formula-works">
<h2>How the Tupper formula works</h2>
<p>The distillation of the Tupper to JS code already peels off a few layers of
mystery. Let's now remove the rest of the curtain on its inner workings.</p>
<p>I'll start by explaining how to take an image we want the formula
to produce and encode it into <em>K</em>. Here are the first three columns of the
Tupper formula plot:</p>
<img alt="Closeup of tupper plot with encoding of pixels" class="align-center" src="https://eli.thegreenplace.net/images/2023/tupper-closeup.png" />
<p>Each pixel in the plot is converted to a bit (0 for light, 1 for dark). We
start at the bottom left corner (<em>x=0</em> and <em>y=K</em>), which is the LSB
(least-significant bit) and move up through the first column; when we reach the
top (<em>x=0</em> and <em>y=K+16</em>), we continue from the bottom of the next column
(<em>x=1</em> and <em>y=K</em>). In the example above, the first bits (from lowest to highest)
of the number are:</p>
<div class="highlight"><pre><span></span>00110010101000100 00101010101111100 ...
</pre></div>
<p>Once we're done with the whole number (106x17 = 1802 bits), we convert it to
decimal - let's call this number <em>IMG</em>, and multiply by 17. The result is <em>K</em>.</p>
<p>Now back to <tt class="docutils literal">tupperFormula</tt>, looking at how it decodes the image back from
<em>x</em> and <em>y</em> (recall that <em>y</em> runs from <em>K</em> to <em>K+16</em>). Let's work through
the first coordinate in detail:</p>
<p>For <em>x=0</em> and <em>y=K</em>, in <tt class="docutils literal">tupperFormula</tt> we get:</p>
<div class="highlight"><pre><span></span>d = (y/17) >> (17x + y%17)
...
substitute x=0, y=K (and recall that K = IMG * 17)
...
d = IMG >> 0
</pre></div>
<p>In other words, <em>d</em> is the lowest bit of <em>IMG</em> - the lowest bit of our image!
We can continue for <em>x=0</em> and <em>y=K+1</em>:</p>
<div class="highlight"><pre><span></span>d = (y/17) >> (17x + y%17)
...
substitute x=0, y=K+1 (and recall that K = IMG * 17)
...
d = IMG >> 1
</pre></div>
<p>Here <em>d</em> is the second lowest bit of <em>IMG</em>. The pattern should be clear by now.</p>
<div class="highlight"><pre><span></span>d = (y/17) >> (17x + y%17)
...
x=0 y=K+2: IMG >> (0 + 2)
x=0 y=K+3: IMG >> (0 + 3)
...
x=0 y=K+16 IMG >> (0 + 16)
x=1 y=K: IMG >> (17 + 0)
x=1 y=K+1: IMG >> (17 + 1)
x=1 y=K+2: IMG >> (17 + 2)
</pre></div>
<p>The formula simply calculates the correct bit of <em>IMG</em> given <em>x</em> and <em>y</em>, using
a modular arithmetic trick to "fold" the 2D <em>x</em> and <em>y</em> into a 1D
sequence (this is just customary <a class="reference external" href="https://eli.thegreenplace.net/2015/memory-layout-of-multi-dimensional-arrays">column-major layout</a>).</p>
<p>This is why the formula can plot any 106x17 grid, given the right <em>K</em>. In the
formula, 17 is not some piece of magic - it's just the height of the grid. As an
exercise, you can modify the formula and code to plot larger or smaller grids.</p>
<p>As a bonus, the JavaScript demo can also encode a grid back to its
representative <em>K</em>; here's the code for it:</p>
<div class="highlight"><pre><span></span><span class="c1">// Calculate K value from the grid.</span><span class="w"></span>
<span class="kd">function</span><span class="w"> </span><span class="nx">encodeGridToK</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">kval</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">BigInt</span><span class="p">(</span><span class="mf">0</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="c1">// Build up K from MSB to LSB, scanning from the top-right corner down and</span><span class="w"></span>
<span class="w"> </span><span class="c1">// then moving left by column.</span><span class="w"></span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">GridWidth</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span><span class="w"> </span><span class="nx">x</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">x</span><span class="o">--</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">GridHeight</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span><span class="w"> </span><span class="nx">y</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">y</span><span class="o">--</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">kval</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">2n</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">kval</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">BigInt</span><span class="p">(</span><span class="nx">Grid</span><span class="p">.</span><span class="nx">getCell</span><span class="p">(</span><span class="nx">x</span><span class="p">,</span><span class="w"> </span><span class="nx">y</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">kval</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">17n</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</pre></div>
<p>It constructs <em>K</em> starting with the MSB, but otherwise the code is
straightforward to follow.</p>
</div>
<div class="section" id="background">
<h2>Background</h2>
<p>The formula was first describe by Jeff Tupper in a 2001 paper titled
"Reliable Two-Dimensional Graphing Methods for Mathematical Formulae with Two
Free Variables". The paper itself focuses on methods of precisely graphing
relations and presents several algorithms to do so. This formula is described
in passing in section 12, and presented as follows:</p>
<img alt="Screenshot from Tupper's paper describing the formula" class="align-center" src="https://eli.thegreenplace.net/images/2023/tupper-paper-crop1.png" />
<p>And Figure 13 is:</p>
<img alt="Screenshot from Tupper's paper showing the formula itself" class="align-center" src="https://eli.thegreenplace.net/images/2023/tupper-paper-crop2.png" />
<p>Interestingly, the <em>K</em> provided by Tupper's paper renders the formula flipped
on both the <em>x</em> and <em>y</em> axes using the standard grid used in this post <a class="footnote-reference" href="#footnote-2" id="footnote-reference-2">[2]</a>.
This is why my JavaScript demo has flip toggles that let you flip the axes of any
plot.</p>
<hr class="docutils" />
<table class="docutils footnote" frame="void" id="footnote-1" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#footnote-reference-1">[1]</a></td><td>This would be</td></tr>
</tbody>
</table>
<div class="highlight"><pre><span></span>1445202489708975828479425373371945674812777822151507024797
1881396854908873568298734888825132090576643817888323197692
3440016667764749242125128995265907053708020473915320841631
7920255490054180047686572016997304663833949016013743197155
2099618114524978194501906835950051065780432564080119786755
6863142280259694206254096081665642417367403946384170774537
4273196064438999230103793989386750257869294552344763192918
6095761834543224800492172803334941981620674985447203819393
9738513848960476759782673313437697051994580681869819330446
336774047268864
</pre></div>
<table class="docutils footnote" frame="void" id="footnote-2" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#footnote-reference-2">[2]</a></td><td>I can totally see why the <em>y</em> axis would be flipped: in computer programs
the concept of the <em>y</em> axis is represented as <em>rows</em> in a grid which
typically count from 0 on top and downwards. It's less clear to me how
the inversion on the <em>x</em> axis came to be.</td></tr>
</tbody>
</table>
</div>
Playing with indirect calls in WebAssembly2023-02-16T05:54:00-08:002023-02-16T14:06:04-08:00Eli Benderskytag:eli.thegreenplace.net,2023-02-16:/2023/playing-with-indirect-calls-in-webassembly/<p>I've recently started exploring WebAssembly, focusing on the language itself, by
writing and testing small snippets of handwritten WASM text (WAT). This post
describes what I've learned about using indirect calls via function tables.</p>
<p>It shows how to invoke WASM-defined and imported functions via indirect calls,
and discusses some related …</p><p>I've recently started exploring WebAssembly, focusing on the language itself, by
writing and testing small snippets of handwritten WASM text (WAT). This post
describes what I've learned about using indirect calls via function tables.</p>
<p>It shows how to invoke WASM-defined and imported functions via indirect calls,
and discusses some related nuances of the WASM value stack.</p>
<p>The full code sample is <a class="reference external" href="https://github.com/eliben/code-for-blog/tree/master/2023/wasm-call-indirect">on GitHub</a>,
but it's also short enough for me to reproduce here. This is the entire WAT file
for this post:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="k">module</span>
<span class="c1">;; The common type we use throughout the sample.</span>
<span class="p">(</span><span class="k">type</span> <span class="nv">$int2int</span> <span class="p">(</span><span class="k">func</span> <span class="p">(</span><span class="k">param</span> <span class="kt">i32</span><span class="p">)</span> <span class="p">(</span><span class="k">result</span> <span class="kt">i32</span><span class="p">)))</span>
<span class="c1">;; Import a function named jstimes3 from the environment and call it</span>
<span class="c1">;; $jstimes3 here.</span>
<span class="p">(</span><span class="k">import</span> <span class="s2">"env"</span> <span class="s2">"jstimes3"</span> <span class="p">(</span><span class="k">func</span> <span class="nv">$jstimes3</span> <span class="p">(</span><span class="k">type</span> <span class="nv">$int2int</span><span class="p">)))</span>
<span class="c1">;; Simple function that adds its parameter to itself and returns the sum.</span>
<span class="p">(</span><span class="k">func</span> <span class="nv">$wasmtimes2</span> <span class="p">(</span><span class="k">type</span> <span class="nv">$int2int</span><span class="p">)</span>
<span class="p">(</span><span class="nb">i32.add</span> <span class="p">(</span><span class="nb">local.get</span> <span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="nb">local.get</span> <span class="mi">0</span><span class="p">))</span>
<span class="p">)</span>
<span class="c1">;; Declare the dispatch function table to have 32 slots, and populate slots</span>
<span class="c1">;; 16 and 17 with functions.</span>
<span class="c1">;; This uses the WASMv1 default table 0.</span>
<span class="p">(</span><span class="k">table</span> <span class="mi">32</span> <span class="k">funcref</span><span class="p">)</span>
<span class="p">(</span><span class="k">elem</span> <span class="p">(</span><span class="nb">i32.const</span> <span class="mi">16</span><span class="p">)</span> <span class="nv">$wasmtimes2</span> <span class="nv">$jstimes3</span><span class="p">)</span>
<span class="c1">;; The following two functions are exported to JS; when JS calls them, they</span>
<span class="c1">;; invoke functions from the table.</span>
<span class="p">(</span><span class="k">func</span> <span class="p">(</span><span class="k">export</span> <span class="s2">"times2"</span><span class="p">)</span> <span class="p">(</span><span class="k">type</span> <span class="nv">$int2int</span><span class="p">)</span>
<span class="c1">;; Place the value of the first parameter on the stack for the function</span>
<span class="c1">;; call_indirect will invoke.</span>
<span class="nb">local.get</span> <span class="mi">0</span>
<span class="c1">;; This call_indirect invokes a function of the given type from table at</span>
<span class="c1">;; offset 16. The parameters to this function are expected to be on</span>
<span class="c1">;; the stack.</span>
<span class="p">(</span><span class="nb">call_indirect</span> <span class="p">(</span><span class="k">type</span> <span class="nv">$int2int</span><span class="p">)</span> <span class="p">(</span><span class="nb">i32.const</span> <span class="mi">16</span><span class="p">))</span>
<span class="p">)</span>
<span class="p">(</span><span class="k">func</span> <span class="p">(</span><span class="k">export</span> <span class="s2">"times3"</span><span class="p">)</span> <span class="p">(</span><span class="k">type</span> <span class="nv">$int2int</span><span class="p">)</span>
<span class="c1">;; This is the same as times2, except it takes the function to call from</span>
<span class="c1">;; offset 17 in the table.</span>
<span class="nb">local.get</span> <span class="mi">0</span>
<span class="p">(</span><span class="nb">call_indirect</span> <span class="p">(</span><span class="k">type</span> <span class="nv">$int2int</span><span class="p">)</span> <span class="p">(</span><span class="nb">i32.const</span> <span class="mi">17</span><span class="p">))</span>
<span class="p">)</span>
<span class="p">)</span>
</pre></div>
<p>It starts by declaring a type that all functions in this sample use: a function
with a single <tt class="docutils literal">i32</tt> parameter and an <tt class="docutils literal">i32</tt> return type.</p>
<p>Then, it defines two functions: one (<tt class="docutils literal">$jstimes3</tt>) is imported from the
environment; we'll see the actual function shortly. The other is a simple WASM
function that adds its parameter to itself and returns the result.</p>
<p>Next, it adds these functions to a <em>table</em>, which in WASM parlance is a dispatch
table for functions. This is what WASM uses to perform <em>indirect</em> function
calls; when you think about function pointers, or references to functions, or
more generally first-class functions - this is how they work in WASM.</p>
<p>In <a class="reference external" href="https://www.w3.org/TR/wasm-core-1/">WASM v1</a>, there's only a single table
in a module, at implicit index 0. With v2 there can be multiple tables and the
index has to be specified explicitly, but we'll focus on v1 here. The following
code:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="k">table</span> <span class="mi">32</span> <span class="k">funcref</span><span class="p">)</span>
<span class="p">(</span><span class="k">elem</span> <span class="p">(</span><span class="nb">i32.const</span> <span class="mi">16</span><span class="p">)</span> <span class="nv">$wasmtimes2</span> <span class="nv">$jstimes3</span><span class="p">)</span>
</pre></div>
<p>First declares the table to have 32 slots of function references <a class="footnote-reference" href="#footnote-1" id="footnote-reference-1">[1]</a>. Then,
it populates the table (starting at offset 16) with the two functions previously
defined in the module. I'm using these indices (and not just 0 and 1) to help
weed out potential value confusion errors; you can replace them by any offset
you wish, as long as it's consistent across the code.</p>
<p>Next, we export two functions to the embedding environment. These perform the
actual dynamic call through the table. We'll get back to these functions later;
first let's see how to run the example.</p>
<div class="section" id="compiling-and-running-the-wat-sample">
<h2>Compiling and running the WAT sample</h2>
<p>This WAT file is saved in <tt class="docutils literal">table.wat</tt> in my sample. To compile it to the
binary WASM format (which can be directly loaded by browsers and other embedding
environments), we'll run the <tt class="docutils literal">wat2wasm</tt> tool from the
<a class="reference external" href="https://github.com/WebAssembly/wabt">WebAssembly Binary Toolkit</a>:</p>
<div class="highlight"><pre><span></span>$ wat2wasm table.wat
</pre></div>
<p>This creates a <tt class="docutils literal">table.wasm</tt> file in the same directory. Now we're ready to
embed it and see it run; there are many embedding environments supporting
WASM these days, but the easiest to work with from the command-line is probably
Node.js, because it emulates the browser embedding environment very well <a class="footnote-reference" href="#footnote-2" id="footnote-reference-2">[2]</a>.
We'll write the following JS and save it in <tt class="docutils literal">table.js</tt>:</p>
<div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">fs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'fs'</span><span class="p">);</span><span class="w"></span>
<span class="kd">const</span><span class="w"> </span><span class="nx">wasmfile</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">fs</span><span class="p">.</span><span class="nx">readFileSync</span><span class="p">(</span><span class="nx">__dirname</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">'/table.wasm'</span><span class="p">);</span><span class="w"></span>
<span class="c1">// This object is imported into wasm.</span><span class="w"></span>
<span class="kd">const</span><span class="w"> </span><span class="nx">importObject</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">env</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">jstimes3</span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="nx">n</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="mf">3</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="nx">WebAssembly</span><span class="p">.</span><span class="nx">instantiate</span><span class="p">(</span><span class="ow">new</span><span class="w"> </span><span class="nb">Uint8Array</span><span class="p">(</span><span class="nx">wasmfile</span><span class="p">),</span><span class="w"> </span><span class="nx">importObject</span><span class="p">).</span><span class="nx">then</span><span class="p">(</span><span class="nx">obj</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="c1">// Get two exported functions from wasm.</span><span class="w"></span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">times2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">obj</span><span class="p">.</span><span class="nx">instance</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">times2</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">times3</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">obj</span><span class="p">.</span><span class="nx">instance</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">times3</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'times2(12) =>'</span><span class="p">,</span><span class="w"> </span><span class="nx">times2</span><span class="p">(</span><span class="mf">12</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'times3(12) =>'</span><span class="p">,</span><span class="w"> </span><span class="nx">times3</span><span class="p">(</span><span class="mf">12</span><span class="p">));</span><span class="w"></span>
<span class="p">});</span><span class="w"></span>
</pre></div>
<p>If you've compiled the WAT file to WASM as instructed and it's in the same
directory as <tt class="docutils literal">table.js</tt>, this should work:</p>
<div class="highlight"><pre><span></span>$ node table.js
times2(12) => 24
times3(12) => 36
</pre></div>
<p>Let's trace what happens when Node invokes <tt class="docutils literal">times3(12)</tt>:</p>
<ol class="arabic simple">
<li><tt class="docutils literal">times3</tt> in the JS code is taken from the exports of the loaded WASM object.</li>
<li>In the WAT code, the <tt class="docutils literal">times3</tt> function performs an indirect call through
the function table, calling the function at offset 17 and forwarding it the
parameter of the call.</li>
<li>What's at offset 17? Looking at the <tt class="docutils literal">elem</tt> command, it's the function
<tt class="docutils literal">$jstimes3</tt>.</li>
<li>Looking further up in the WAT code, <tt class="docutils literal">$jstimes3</tt> identifies a function
imported from the embedding environment's <tt class="docutils literal">env</tt> object, named <tt class="docutils literal">jstimes3</tt>.</li>
<li>Now looking in the JS again, <tt class="docutils literal">jstimes3</tt> is defined as <tt class="docutils literal">(n) => 3 * n</tt> in
the <tt class="docutils literal">env</tt> key of the import object: a function that multiplies its
parameter by 3.</li>
</ol>
<p>Phew! Quite a journey - from JS into WASM, stored in a dispatch table, and
called from JS again. This sample kicks the tires rather thoroughly.</p>
</div>
<div class="section" id="call-indirect-in-detail">
<h2><tt class="docutils literal">call_indirect</tt> in detail</h2>
<p>Since our two functions that perform an indirect call are similar, let's just
focus on one of them:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="k">func</span> <span class="p">(</span><span class="k">export</span> <span class="s2">"times3"</span><span class="p">)</span> <span class="p">(</span><span class="k">type</span> <span class="nv">$int2int</span><span class="p">)</span>
<span class="nb">local.get</span> <span class="mi">0</span>
<span class="p">(</span><span class="nb">call_indirect</span> <span class="p">(</span><span class="k">type</span> <span class="nv">$int2int</span><span class="p">)</span> <span class="p">(</span><span class="nb">i32.const</span> <span class="mi">17</span><span class="p">))</span>
<span class="p">)</span>
</pre></div>
<p>This invocation uses the WAT <em>folded instruction</em> capability of allowing s-exprs
instead of a linear instruction sequence. The only required static parameter to
<tt class="docutils literal">call_direct</tt> is the type index, so we can rewrite the function's contents as
follows:</p>
<div class="highlight"><pre><span></span><span class="nb">local.get</span> <span class="mi">0</span>
<span class="nb">i32.const</span> <span class="mi">17</span>
<span class="nb">call_indirect</span> <span class="p">(</span><span class="k">type</span> <span class="nv">$int2int</span><span class="p">)</span>
</pre></div>
<p>When <tt class="docutils literal">call_indirect</tt> is called, it takes the function index in the table
from the top of the value stack. That's why <tt class="docutils literal">i32.const 17</tt> comes immediately
before the call.</p>
<p>The <tt class="docutils literal">(type $int2int)</tt> parameter is required; it we try to omit it, the
<tt class="docutils literal">wat2wasm</tt> compilation will not complain (I wonder why), but during
execution we'll run into an error:</p>
<div class="highlight"><pre><span></span>RuntimeError: null function or function signature mismatch
</pre></div>
<p>As mentioned above, our original usage of <tt class="docutils literal">call_indirect</tt> relies on a folded
instruction to provide the function index in-line in the call expression. Maybe
we can place the whole thing inline:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">call_indirect</span> <span class="p">(</span><span class="k">type</span> <span class="nv">$int2int</span><span class="p">)</span> <span class="p">(</span><span class="nb">i32.const</span> <span class="mi">17</span><span class="p">)</span> <span class="p">(</span><span class="nb">local.get</span> <span class="mi">0</span><span class="p">))</span>
</pre></div>
<p>This produces a runtime error again:</p>
<div class="highlight"><pre><span></span>RuntimeError: null function or function signature mismatch
</pre></div>
<p>Interestingly, it will work fine if we flip the order of the last two arguments:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">call_indirect</span> <span class="p">(</span><span class="k">type</span> <span class="nv">$int2int</span><span class="p">)</span> <span class="p">(</span><span class="nb">local.get</span> <span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="nb">i32.const</span> <span class="mi">17</span><span class="p">))</span>
</pre></div>
<p>At first sight, this is paradoxical: don't we have to pass in the function index
first, and only then the parameter to the dynamically-called function?</p>
<p>To understand why there's no paradox here, we have to learn a bit about how
the WASM stack works. The first thing to learn is that binary instructions and
calls expect their arguments on the stack in reverse order (the first argument
deepest in the stack while the last is on top of the stack). To demonstrate
this, here's a function that performs subtraction of two <tt class="docutils literal">i32</tt> values:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="k">func</span> <span class="p">(</span><span class="k">export</span> <span class="s2">"dosub1"</span><span class="p">)</span>
<span class="p">(</span><span class="k">param</span> <span class="kt">i32</span><span class="p">)</span> <span class="p">(</span><span class="k">param</span> <span class="kt">i32</span><span class="p">)</span>
<span class="p">(</span><span class="k">result</span> <span class="kt">i32</span><span class="p">)</span>
<span class="nb">local.get</span> <span class="mi">0</span>
<span class="nb">local.get</span> <span class="mi">1</span>
<span class="nb">i32.sub</span>
<span class="p">)</span>
</pre></div>
<p>We push the first argument on the stack first (<tt class="docutils literal">local.get 0</tt>), and then the
second argument. This means that when <tt class="docutils literal">i32.sub</tt> is called, the stack looks
like this:</p>
<div class="highlight"><pre><span></span>| param 1 | <<-- top of stack
|---------|
| param 0 |
-----------
</pre></div>
<p>The second thing to learn is how WASM's folded instructions are compiled <a class="footnote-reference" href="#footnote-3" id="footnote-reference-3">[3]</a>.
An equivalent subtraction function that uses folded instructions is:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="k">func</span> <span class="p">(</span><span class="k">export</span> <span class="s2">"dosub2"</span><span class="p">)</span>
<span class="p">(</span><span class="k">param</span> <span class="kt">i32</span><span class="p">)</span> <span class="p">(</span><span class="k">param</span> <span class="kt">i32</span><span class="p">)</span>
<span class="p">(</span><span class="k">result</span> <span class="kt">i32</span><span class="p">)</span>
<span class="p">(</span><span class="nb">i32.sub</span> <span class="p">(</span><span class="nb">local.get</span> <span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="nb">local.get</span> <span class="mi">1</span><span class="p">))</span>
<span class="p">)</span>
</pre></div>
<p>This is exactly equivalent to the <tt class="docutils literal">dosub1</tt> function. The WAT compiler unfolds
the folded instruction into the same sequence:</p>
<div class="highlight"><pre><span></span><span class="nb">local.get</span> <span class="mi">0</span>
<span class="nb">local.get</span> <span class="mi">1</span>
<span class="nb">i32.sub</span>
</pre></div>
<p>So far so good; now let's get back to our first attempt at fully folding the
indirect call:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">call_indirect</span> <span class="p">(</span><span class="k">type</span> <span class="nv">$int2int</span><span class="p">)</span> <span class="p">(</span><span class="nb">i32.const</span> <span class="mi">17</span><span class="p">)</span> <span class="p">(</span><span class="nb">local.get</span> <span class="mi">0</span><span class="p">))</span>
</pre></div>
<p>As discussed, this is equivalent to:</p>
<div class="highlight"><pre><span></span><span class="nb">i32.const</span> <span class="mi">17</span>
<span class="nb">local.get</span> <span class="mi">0</span>
<span class="nb">call_indirect</span> <span class="p">(</span><span class="k">type</span> <span class="nv">$int2int</span><span class="p">)</span>
</pre></div>
<p>But herein lies the catch. When the <tt class="docutils literal">call_indirect</tt> instruction executes, it
only needs a single stack argument - the function index. The value on top of
the stack when it executes is the <tt class="docutils literal">local.get 0</tt>, which is not the function
index - so we get an error. If you recall, the fully unfolded form that does
work is:</p>
<div class="highlight"><pre><span></span><span class="nb">local.get</span> <span class="mi">0</span>
<span class="nb">i32.const</span> <span class="mi">17</span>
<span class="nb">call_indirect</span> <span class="p">(</span><span class="k">type</span> <span class="nv">$int2int</span><span class="p">)</span>
</pre></div>
<p>This is because <tt class="docutils literal">i32.const 17</tt> is the argument to <tt class="docutils literal">call_indirect</tt>; it takes
it from the top of the stack and executes to find a function in the table. Then,
once the function is found, <em>that function</em> takes its own arguments from the
stack in the usual order, and that's where it finds the <tt class="docutils literal">local.get 0</tt> it
needs.</p>
<p>This is why listing the arguments in reverse in the folded form works:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">call_indirect</span> <span class="p">(</span><span class="k">type</span> <span class="nv">$int2int</span><span class="p">)</span> <span class="p">(</span><span class="nb">local.get</span> <span class="mi">0</span><span class="p">)</span> <span class="p">(</span><span class="nb">i32.const</span> <span class="mi">17</span><span class="p">))</span>
</pre></div>
<p>The folded form is definitely useful when all of the arguments are actually
passed to the instruction/call that heads the s-expr. For more complex stack
interactions like <tt class="docutils literal">call_indirect</tt>, the folded form seems more confusing than
helpful. This is why I originally wrote this function as:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="k">func</span> <span class="p">(</span><span class="k">export</span> <span class="s2">"times3"</span><span class="p">)</span> <span class="p">(</span><span class="k">type</span> <span class="nv">$int2int</span><span class="p">)</span>
<span class="nb">local.get</span> <span class="mi">0</span>
<span class="p">(</span><span class="nb">call_indirect</span> <span class="p">(</span><span class="k">type</span> <span class="nv">$int2int</span><span class="p">)</span> <span class="p">(</span><span class="nb">i32.const</span> <span class="mi">17</span><span class="p">))</span>
<span class="p">)</span>
</pre></div>
<p>It clearly distinguishes between the different kinds of arguments;
<tt class="docutils literal">i32.const 17</tt> belongs to the <tt class="docutils literal">call_indirect</tt>, so it's part of the s-expr.
<tt class="docutils literal">local.get 0</tt> has to be on the stack for the dynamically-called function,
so it's left out separately.</p>
<hr class="docutils" />
<table class="docutils footnote" frame="void" id="footnote-1" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#footnote-reference-1">[1]</a></td><td>The type used for function references is <tt class="docutils literal">funcref</tt>; in some older
code samples you'll see <tt class="docutils literal">anyfunc</tt> used instead, but this is outdated.
The WASM standard settled on <tt class="docutils literal">funcref</tt> and tools like <tt class="docutils literal">wat2wasm</tt> will
reject <tt class="docutils literal">anyfunc</tt>.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="footnote-2" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#footnote-reference-2">[2]</a></td><td>I've also written the same program using Go and the
<a class="reference external" href="https://wazero.io/">wazero runtime</a>; it's <a class="reference external" href="https://github.com/eliben/code-for-blog/tree/master/2023/wasm-call-indirect/go-env">on GitHub too</a>.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="footnote-3" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#footnote-reference-3">[3]</a></td><td>In one sense, WAT is to WASM like assembly language is to machine code;
therefore, it may make sense of talking about <em>assembling</em> WAT code into
WASM. On the other hand, with syntactic sugar like folded instructions
WAT is much higher level than most assembly languages, so the term
<em>compiling</em> makes some sense too. In this post I'm using <em>compiling</em>
for consistency.</td></tr>
</tbody>
</table>
</div>
Sudoku, Go and WebAssembly2022-09-05T06:33:00-07:002023-04-18T02:58:45-07:00Eli Benderskytag:eli.thegreenplace.net,2022-09-05:/2022/sudoku-go-and-webassembly/<p>Over the summer my family has experienced a brief renaissance of interest in
Sudoku, particularly as I've tried to get my kids to practice solving some
non-trivial puzzles (pro tip: YouTube videos help).</p>
<object class="align-center" data="https://eli.thegreenplace.net/images/2022/sudoku-puzzle.svg" type="image/svg+xml">Sudoku puzzle sample</object>
<p>Naturally, whenever a programmer encounters Sudoku it's hard to avoid thinking
about automated solvers …</p><p>Over the summer my family has experienced a brief renaissance of interest in
Sudoku, particularly as I've tried to get my kids to practice solving some
non-trivial puzzles (pro tip: YouTube videos help).</p>
<object class="align-center" data="https://eli.thegreenplace.net/images/2022/sudoku-puzzle.svg" type="image/svg+xml">Sudoku puzzle sample</object>
<p>Naturally, whenever a programmer encounters Sudoku it's hard to avoid thinking
about automated solvers. In fact, I've already written <a class="reference external" href="https://eli.thegreenplace.net/2007/04/08/sudoku-as-a-sat-problem">a Sudoku solver many
years ago</a>;
it converts the puzzle into a <a class="reference external" href="https://en.wikipedia.org/wiki/Boolean_satisfiability_problem">SAT problem</a> and solves
that using a standard SAT solver.</p>
<p>This time I wanted something more conventional, because I was also interested in
generating Sudoku puzzles of varying difficulty. In addition, I wanted to
experiment with running Go code in the browser via WebAssembly. The result is
the <a class="reference external" href="https://github.com/eliben/go-sudoku">go-sudoku repository</a>; this post
describes what it does and how to use it. The Go package in my repository is
best seen as a <em>toolkit</em> for solving, generating and evaluating the difficulty
of Sudoku puzzles.</p>
<div class="section" id="solving-puzzles">
<h2>Solving puzzles</h2>
<p>I started from Peter Norvig's fantastic <a class="reference external" href="https://norvig.com/sudoku.html">Solving Every Sudoku Puzzle</a>, where he describes a
constraint-satisfaction solver written in Python. The solver only employs basic
row/column/block elimination as a solution strategy and then runs a recursive
search when stuck. This approach is very fast for solving Sudoku puzzles that
have a single solution.</p>
<p>Norvig's solver in Python is already quick, but my Go code is <em>far</em> faster still
- around 100x faster in some informal measurements. One reason for this is that
Go - in general - is more efficient than Python. Another is a key optimization
to the core data structure; Norvig's code uses a string to represent the
possible digits in a Sudoku square; e.g. if some square can have any value
except 2, it's represented as the string "13456789". So there's a lot of string
allocation, deallocation and linear scanning. I've replaced this by a single
<tt class="docutils literal">uint16</tt> in Go with bitwise operations to add/remove/test digits. My solver
burns through Norvig's list of "hard" Sudoku puzzles taking less than a <em>quarter
of a millisecond</em> per puzzle on average.</p>
<p>I've also added a <tt class="docutils literal">SolveAll</tt> function that finds <em>all</em> solutions of a given
Sudoku puzzle; careful - do not run this on an empty board :-)</p>
<p>In the repository, you can try the solver by running the <tt class="docutils literal">cmd/solver</tt> command.</p>
</div>
<div class="section" id="more-powerful-sudoku-solving-strategies">
<h2>More powerful Sudoku solving strategies</h2>
<p>As mentioned above, my solver follows Norvig's in that it only applies the basic
"first-order" constraint propagation technique to Sudoku - elimination. Expert
human Sudoku solvers have many higher-order techniques at their disposal. For
my solver, I experimented with implementing one of them - <a class="reference external" href="https://www.sudokuoftheday.com/techniques/naked-pairs-triples">Naked Pairs</a>
(alternatively known as "Naked Twins").</p>
<p>While the implementation works (check out the <tt class="docutils literal">ApplyTwinsStrategy</tt> function),
I found that it's not very helpful for the automated solver. The backtracking
search is so fast that burdening it with additional strategies makes
it <em>slower</em>, not faster. YMMV.</p>
</div>
<div class="section" id="generating-puzzles">
<h2>Generating puzzles</h2>
<p>Solving puzzles was just the warmump - I've done this before and just wanted
the infrastructure set up. What I was really after is <em>generating</em> interesting
Sudoku puzzles.</p>
<p>The approach Norvig uses is:</p>
<ol class="arabic simple">
<li>Start from an empty board</li>
<li>Assign random digits to random squares until a contradiction is reached
(the puzzle becomes unsolvable), or a minimum count of assigned squares
is reached.</li>
</ol>
<p>Unfortunately, most puzzles produced this way will have multiple solutions; if
you've done a bit of manual Sudoku-ing, you'll know that puzzles with multiple
solutions suck - no one likes them.</p>
<p>Instead, we can start with an empty board:</p>
<object class="align-center" data="https://eli.thegreenplace.net/images/2022/sudoku-empty.svg" type="image/svg+xml">Empty Sudoku board</object>
<p>Then, we solve the board using a randomized solver (a solver which randomizes
the order of guessed digits it tries to assign to empty squares); this is a very
quick process (tens of micro-seconds) that produces a random <em>valid</em> solution:</p>
<object class="align-center" data="https://eli.thegreenplace.net/images/2022/sudoku-solved.svg" type="image/svg+xml">Random solved sudoku</object>
<p>Now, we remove numbers from squares on the board one by one (in random order).
At each step, we make sure that the resulting board still has a single solution.
We stop when some pre-set threshold is reached - number of remaining hints,
some difficulty estimate, etc.</p>
<p>Compared to the method used by Norvig, this approach has a powerful advantage:
the produced puzzle is guaranteed to have a single solution. It also has a
limitation: it's challenging to generate extremely hard puzzles with very low
hint counts. That said, the puzzles it generates can certainly be hard enough
for non-experts, so it's not a huge problem in practice <a class="footnote-reference" href="#footnote-1" id="footnote-reference-1">[1]</a>.</p>
</div>
<div class="section" id="estimating-puzzle-difficulty">
<h2>Estimating puzzle difficulty</h2>
<p>Estimating the difficulty of Sudoku puzzles is important if you want to generate
puzzles for others to solve. The most enjoyable puzzles are just
at the right level of difficulty - not too easy and not too hard. The estimation
process itself is fairly complicated and heuristic, and there are academic
papers written on the subject.</p>
<p>In the <tt class="docutils literal"><span class="pre">go-sudoku</span></tt> package the evaluation (<tt class="docutils literal">EvaluateDifficulty</tt>) is inspired
by the paper "Sudoku Puzzles Generating: from Easy to Evil" by Xiang-Sun ZHANG's
research group, with some tweaks. The difficulty score is provided on a scale
from 1.0 (easiest) to 5.0 (hardest). Generally, I find that puzzles with
difficulty 3 or above are pretty hard!</p>
</div>
<div class="section" id="web-interface">
<h2>Web interface</h2>
<p>Since my ultimate goal was to generate printable Sudoku puzzles for my family, I
wanted a simple graphical interface one could use to generate puzzles and print
those that look good. Instead of mucking with GUIs or PDFs, I decided to embrace
the web! This is achieved in two steps:</p>
<ol class="arabic simple">
<li>The <tt class="docutils literal"><span class="pre">go-sudoku</span></tt> package can emit any Sudoku board into a SVG image.</li>
<li>Using Go's <tt class="docutils literal">wasm</tt> backend, the package is compiled to WebAssembly and
attached to a simple JS/HTML frontend.</li>
</ol>
<p>The result is quite pleasing - <a class="reference external" href="https://eliben.github.io/go-sudoku/">you can check it out online</a> <a class="footnote-reference" href="#footnote-2" id="footnote-reference-2">[2]</a>; here's a screenshot:</p>
<img alt="Screenshot from web interface" class="align-center" src="https://eli.thegreenplace.net/images/2022/sudoku-web-browser.png" />
<p>The "Hint count" box tells the generator how many hints (non-empty squares) to
leave on the board. For low counts (lower than 25 or so) it should be treated
as a lower bound; the generator will often generate puzzles with slightly more
hints. Also, the lower the hint count, the longer it might take to run.</p>
<p>Compiling my Go code to WebAssembly turned out to be surprisingly easy! If
you're interested in seeing how it works, take a look at the
<a class="reference external" href="https://github.com/eliben/go-sudoku/tree/main/cmd/wasm">cmd/wasm directory</a>
in the repository.</p>
<hr class="docutils" />
<table class="docutils footnote" frame="void" id="footnote-1" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#footnote-reference-1">[1]</a></td><td><p class="first">Generating truly hard Sudoku puzzles with a single solution is a bit of
an art. Typically, a long time is spent in computational search to
generate a single very hard puzzle.</p>
<p class="last">Once we have a single puzzle with a single solution, we can transform it
in many ways, keeping it valid but with a completely different "look and
feel". For example, we can transpose rows and columns (within the same
block); we can rotate the puzzle by 90, 180 and 270 degrees; we can
permute its digits arbitrarily, and so on. In the end, a huge number of
variations can be produced - all of the same difficulty.</p>
</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="footnote-2" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#footnote-reference-2">[2]</a></td><td>Making this interface available through GitHub pages was pleasantly
simple thanks to deployment via GitHub actions. Take a look in the
<tt class="docutils literal">.github/worflows</tt> directory, if you're interested in the details.</td></tr>
</tbody>
</table>
</div>
An Intel 8080 assembler and online simulator2020-07-25T16:00:00-07:002022-10-04T14:08:24-07:00Eli Benderskytag:eli.thegreenplace.net,2020-07-25:/2020/an-intel-8080-assembler-and-online-simulator/<p>While going through Charles Petzold's "Code" book again, I was looking for an
easy-to-use online assembler and simulator for the classic <a class="reference external" href="https://en.wikipedia.org/wiki/Intel_8080">Intel 8080 CPU</a>, but couldn't find anything that
fit my needs exactly. There are some well-done tools out there, but they seem to
be more geared to running game …</p><p>While going through Charles Petzold's "Code" book again, I was looking for an
easy-to-use online assembler and simulator for the classic <a class="reference external" href="https://en.wikipedia.org/wiki/Intel_8080">Intel 8080 CPU</a>, but couldn't find anything that
fit my needs exactly. There are some well-done tools out there, but they seem to
be more geared to running game ROMs and large programs on an emulator; my need
was different - I just wanted something to play with, to practice 8080 assembly
programming.</p>
<p>So I ended up rolling my own, and the <a class="reference external" href="https://github.com/eliben/js-8080-sim/">js-8080-sim</a> project was born. The project has
three main parts:</p>
<ul class="simple">
<li>An assembler for the 8080: translating assembly language code into 8080
machine code. I wrote a custom assembler for this.</li>
<li>A CPU simulator: simulating 8080 machine code. For this purpose I cloned
the <a class="reference external" href="https://github.com/maly/8080js">maly/8080js</a> project into my
repository <a class="footnote-reference" href="#footnote-1" id="footnote-reference-1">[1]</a> and tweaked it a little bit.</li>
<li>A simple web UI for writing 8080 assembly code, running it and observing the
results (as changed values in memory and registers). I wrote a basic UI in
JS:</li>
</ul>
<img alt="js 8080 web UI screenshot" class="align-center" src="https://github.com/eliben/js-8080-sim/blob/master/doc/js-sim-screenshot.png?raw=true" style="width: 650px;" />
<p>If you want to play with the simulator, a live version is available online at
<a class="reference external" href="https://eliben.org/js8080">https://eliben.org/js8080</a></p>
<p>The UI is purely client-side; it makes no requests and just uses your browser
as a GUI. It does use the browser's local storage to save the last program you
ran.</p>
<p>Issues and PRs <a class="reference external" href="https://github.com/eliben/js-8080-sim/">on GitHub</a> welcome!</p>
<div class="section" id="on-javascript-and-frameworks">
<h2>On JavaScript and frameworks</h2>
<p>Using JS for a project like this is very natural, because ultimately what I'm
interested in is having a convenient web UI to play with the simulator. When
I do this, I almost always end up writing vanilla HTML+CSS+JS, avoiding
frameworks. I don't write JS often, so whenever I get to work on a new project,
the framework <em>du juor</em> has typically changed from the last time, and I just
don't have the time to keep track. Vanilla HTML+CSS+JS has much better
longevity, IMHO, although it does mean somewhat more manual work (e.g. to keep
the UI in sync with the application state).</p>
<p>The only framework I was tempted to use is Bootstrap for the CSS and layout,
but eventually decided against it in the interest of simplicity.</p>
<p>We're fortunate to have much more stable and usable JS and web APIs in 2020
compared to just a few years ago. For the simulator I've been using the ES6
version of JS, which is widely supported today and offers many niceties.</p>
<hr class="docutils" />
<table class="docutils footnote" frame="void" id="footnote-1" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#footnote-reference-1">[1]</a></td><td>I went with vendoring 8080js because it appears to be unmaintained,
and I also wanted to avoid a dependency, preferring the project to be
self-contained. This was easy with 8080js because it's a single JS file
and it has a permissive 2-clause BSD license. I've reproduced the license
in full in the cloned source file. FWIW, 8080js itself is also based on
an earlier BSD-licensed simulator; OSS at its best :-)</td></tr>
</tbody>
</table>
</div>
Concurrent Servers: Part 6 - Callbacks, Promises and async/await2018-05-08T05:50:00-07:002023-02-04T13:41:52-08:00Eli Benderskytag:eli.thegreenplace.net,2018-05-08:/2018/concurrent-servers-part-6-callbacks-promises-and-asyncawait/<p>This is part 6 in a series of posts on writing concurrent network servers. Parts
3, 4, and 5 in the series discussed the <em>event-driven</em> approach to building
concurrent servers, alternatively known as <em>asynchronous programming</em>. In this
part, we're going to look at some of the challenges inherent in this …</p><p>This is part 6 in a series of posts on writing concurrent network servers. Parts
3, 4, and 5 in the series discussed the <em>event-driven</em> approach to building
concurrent servers, alternatively known as <em>asynchronous programming</em>. In this
part, we're going to look at some of the challenges inherent in this style of
programming and examine some of the modern solutions available.</p>
<p>This post covers many topics, and as such can't cover all of them in great
detail. It comes with sizable, fully-working code samples, so I hope it can
serve as a good starting point for learning if these topics interest you.</p>
<p>All posts in the series:</p>
<ul class="simple">
<li><a class="reference external" href="https://eli.thegreenplace.net/2017/concurrent-servers-part-1-introduction/">Part 1 - Introduction</a></li>
<li><a class="reference external" href="https://eli.thegreenplace.net/2017/concurrent-servers-part-2-threads/">Part 2 - Threads</a></li>
<li><a class="reference external" href="https://eli.thegreenplace.net/2017/concurrent-servers-part-3-event-driven/">Part 3 - Event-driven</a></li>
<li><a class="reference external" href="https://eli.thegreenplace.net/2017/concurrent-servers-part-4-libuv/">Part 4 - libuv</a></li>
<li><a class="reference external" href="https://eli.thegreenplace.net/2017/concurrent-servers-part-5-redis-case-study/">Part 5 - Redis case study</a></li>
<li><a class="reference external" href="https://eli.thegreenplace.net/2018/concurrent-servers-part-6-callbacks-promises-and-asyncawait/">Part 6 - Callbacks, Promises and async/await</a></li>
</ul>
<div class="section" id="revisiting-the-primality-testing-server-with-node-js">
<h2>Revisiting the primality testing server with Node.js</h2>
<p>So far the series has focused on a simple state-machine protocol,
to demonstrate the challenges of keeping client-specific state on the server. In
this part, I want to focus on a different challenge - keeping track of waiting
for multiple things on the server side. To this end, I'm going to revisit the
primality testing server that appeared in <a class="reference external" href="https://eli.thegreenplace.net/2017/concurrent-servers-part-4-libuv/">part 4</a>, where it
was implemented in C using <tt class="docutils literal">libuv</tt>.</p>
<p>Here we're going to reimplement this in JavaScript, using the Node.js
server-side framework and execution engine. Node.js is a popular server-side
programming environment that brought the asynchronous style of programming
into the limelight when it appeared in 2009 <a class="footnote-reference" href="#footnote-1" id="footnote-reference-1">[1]</a>.</p>
<p>The C code for the original primality testing server <a class="reference external" href="https://github.com/eliben/code-for-blog/blob/master/2017/async-socket-server/uv-isprime-server.c">is here</a>.
It listens on a socket for numbers to arrive, tests them for primality (using
the slow brute-force method) and sends back "prime" or "composite". It
optionally uses <tt class="docutils literal">libuv</tt>'s work queues to offload the computation itself to a
thread, to avoid blocking the main event loop.</p>
<p>Let's reconstruct this server in steps in Node.js, starting with a basic server
that does all computations in the main thread (all the code for this post is
<a class="reference external" href="https://github.com/eliben/code-for-blog/blob/master/2018/async-socket-server/">available here</a>):</p>
<div class="highlight"><pre><span></span><span class="kd">var</span><span class="w"> </span><span class="nx">net</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'net'</span><span class="p">);</span><span class="w"></span>
<span class="kd">var</span><span class="w"> </span><span class="nx">utils</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'./utils.js'</span><span class="p">);</span><span class="w"></span>
<span class="kd">var</span><span class="w"> </span><span class="nx">portnum</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">8070</span><span class="p">;</span><span class="w"></span>
<span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">argv</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">2</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">portnum</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">argv</span><span class="p">[</span><span class="mf">2</span><span class="p">];</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="kd">var</span><span class="w"> </span><span class="nx">server</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">net</span><span class="p">.</span><span class="nx">createServer</span><span class="p">();</span><span class="w"></span>
<span class="nx">server</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">'connection'</span><span class="p">,</span><span class="w"> </span><span class="nx">handleConnection</span><span class="p">);</span><span class="w"></span>
<span class="nx">server</span><span class="p">.</span><span class="nx">listen</span><span class="p">(</span><span class="nx">portnum</span><span class="p">,</span><span class="w"> </span><span class="kd">function</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'Serving on port %d'</span><span class="p">,</span><span class="w"> </span><span class="nx">portnum</span><span class="p">);</span><span class="w"></span>
<span class="p">});</span><span class="w"></span>
<span class="kd">function</span><span class="w"> </span><span class="nx">handleConnection</span><span class="p">(</span><span class="nx">conn</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">remoteAddress</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">conn</span><span class="p">.</span><span class="nx">remoteAddress</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">':'</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">conn</span><span class="p">.</span><span class="nx">remotePort</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'peer %s connected'</span><span class="p">,</span><span class="w"> </span><span class="nx">remoteAddress</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="nx">conn</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">'data'</span><span class="p">,</span><span class="w"> </span><span class="nx">onConnData</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="nx">conn</span><span class="p">.</span><span class="nx">once</span><span class="p">(</span><span class="s1">'close'</span><span class="p">,</span><span class="w"> </span><span class="nx">onConnClose</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="nx">conn</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">'error'</span><span class="p">,</span><span class="w"> </span><span class="nx">onConnError</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">onConnData</span><span class="p">(</span><span class="nx">d</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">num</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">utils</span><span class="p">.</span><span class="nx">buf2num</span><span class="p">(</span><span class="nx">d</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'num %d'</span><span class="p">,</span><span class="w"> </span><span class="nx">num</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">answer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">utils</span><span class="p">.</span><span class="nx">isPrime</span><span class="p">(</span><span class="nx">num</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span><span class="p">)</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="s2">"prime"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"composite"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="nx">conn</span><span class="p">.</span><span class="nx">write</span><span class="p">(</span><span class="nx">answer</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">'\n'</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'... %d is %s'</span><span class="p">,</span><span class="w"> </span><span class="nx">num</span><span class="p">,</span><span class="w"> </span><span class="nx">answer</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">onConnClose</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'connection from %s closed'</span><span class="p">,</span><span class="w"> </span><span class="nx">remoteAddress</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">onConnError</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'connection %s error: %s'</span><span class="p">,</span><span class="w"> </span><span class="nx">remoteAddress</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">.</span><span class="nx">message</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</pre></div>
<p>This is standard Node.js fare; the interesting work happens in the
<tt class="docutils literal">onConnData</tt> callback, which is called whenever new data arrives on the
socket. We're missing a couple of utility functions used by this code - they
are in <tt class="docutils literal">utils.js</tt>:</p>
<div class="highlight"><pre><span></span><span class="c1">// Check if n is prime, returning a boolean. The delay parameter is optional -</span><span class="w"></span>
<span class="c1">// if it's true the function will block for n milliseconds before computing the</span><span class="w"></span>
<span class="c1">// answer.</span><span class="w"></span>
<span class="nx">exports</span><span class="p">.</span><span class="nx">isPrime</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="p">(</span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">delay</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">delay</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="kc">true</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">sleep</span><span class="p">(</span><span class="nx">n</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">%</span><span class="w"> </span><span class="mf">2</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mf">2</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="kc">true</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">var</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">3</span><span class="p">;</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="nx">n</span><span class="p">;</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mf">2</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">%</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="c1">// Parse the given a buffer into a number. buf is of class Buffer; it stores the</span><span class="w"></span>
<span class="c1">// ascii representation of the number followed by some non-digits (like a</span><span class="w"></span>
<span class="c1">// newline).</span><span class="w"></span>
<span class="nx">exports</span><span class="p">.</span><span class="nx">buf2num</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="p">(</span><span class="nx">buf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">num</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">code0</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'0'</span><span class="p">.</span><span class="nx">charCodeAt</span><span class="p">(</span><span class="mf">0</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">code9</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'9'</span><span class="p">.</span><span class="nx">charCodeAt</span><span class="p">(</span><span class="mf">0</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">var</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="nx">buf</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="o">++</span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">buf</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="nx">code0</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">buf</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="nx">code9</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">num</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">num</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">10</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">buf</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">code0</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">num</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
<span class="c1">// Blocking sleep for the given number of milliseconds. Uses a spin-loop to</span><span class="w"></span>
<span class="c1">// block; note that this loads the CPU and is only useful for simulating load.</span><span class="w"></span>
<span class="kd">function</span><span class="w"> </span><span class="nx">sleep</span><span class="p">(</span><span class="nx">ms</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">awake_time</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nb">Date</span><span class="p">().</span><span class="nx">getTime</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">ms</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="nx">awake_time</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nb">Date</span><span class="p">().</span><span class="nx">getTime</span><span class="p">())</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</pre></div>
<p>For testing and demonstration purposes, <tt class="docutils literal">isPrime</tt> accepts an optional
<tt class="docutils literal">delay</tt> parameter; if <tt class="docutils literal">true</tt>, the function will sleep for the number of
milliseconds given by <tt class="docutils literal">n</tt> before computing whether <tt class="docutils literal">n</tt> is a prime <a class="footnote-reference" href="#footnote-2" id="footnote-reference-2">[2]</a>.</p>
</div>
<div class="section" id="offloading-cpu-intensive-computations">
<h2>Offloading CPU-intensive computations</h2>
<p>Naturally, the server shown above is poorly designed for concurrency; it has a
single thread that will stop listening for new clients while it's busy computing
the prime-ness of a large number for an existing client.</p>
<p>The natural way to handle this is to offload the CPU intensive computation to
a thread. Alas, JavaScript doesn't support threads and Node.js doesn't either.
Node.js <em>does</em> support sub-processes though, with its <tt class="docutils literal">child_process</tt> package.
Our <a class="reference external" href="https://github.com/eliben/code-for-blog/blob/master/2018/async-socket-server/primeserver-offload.js">next version of the server</a>
leverages this capability. Here is the relevant part in the new server - the
<tt class="docutils literal">onConnData</tt> callback:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">onConnData</span><span class="p">(</span><span class="nx">d</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">num</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">utils</span><span class="p">.</span><span class="nx">buf2num</span><span class="p">(</span><span class="nx">d</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'num %d'</span><span class="p">,</span><span class="w"> </span><span class="nx">num</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="c1">// Fork off a worker to do this computation, and add a callback to handle</span><span class="w"></span>
<span class="w"> </span><span class="c1">// the result when it's ready. After the callback is set up, this function</span><span class="w"></span>
<span class="w"> </span><span class="c1">// returns so the server can resume the event loop.</span><span class="w"></span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">worker</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">child_process</span><span class="p">.</span><span class="nx">fork</span><span class="p">(</span><span class="s1">'./primeworker.js'</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="nx">worker</span><span class="p">.</span><span class="nx">send</span><span class="p">(</span><span class="nx">num</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="nx">worker</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">'message'</span><span class="p">,</span><span class="w"> </span><span class="nx">message</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">answer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">message</span><span class="p">.</span><span class="nx">result</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="s2">"prime"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s2">"composite"</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="nx">conn</span><span class="p">.</span><span class="nx">write</span><span class="p">(</span><span class="nx">answer</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">'\n'</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'... %d is %s'</span><span class="p">,</span><span class="w"> </span><span class="nx">num</span><span class="p">,</span><span class="w"> </span><span class="nx">answer</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">});</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</pre></div>
<p>When new data is received from a connected client, this server forks off a
sub-process to execute code in <tt class="docutils literal">primeworker.js</tt>, sends it the task using IPC
and attaches a callback on new messages received from the worker. It then cedes
control to the event loop - so there's no bad blocking happening here.
<tt class="docutils literal">primeworker.js</tt> is very simple:</p>
<div class="highlight"><pre><span></span><span class="kd">var</span><span class="w"> </span><span class="nx">utils</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'./utils.js'</span><span class="p">);</span><span class="w"></span>
<span class="nx">process</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">'message'</span><span class="p">,</span><span class="w"> </span><span class="nx">message</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'[child %d] received message from server:'</span><span class="p">,</span><span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">pid</span><span class="p">,</span><span class="w"> </span><span class="nx">message</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="c1">// Compute the result (with emulate ddelay) and send back a message.</span><span class="w"></span>
<span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">send</span><span class="p">({</span><span class="nx">task</span><span class="o">:</span><span class="w"> </span><span class="nx">message</span><span class="p">,</span><span class="w"> </span><span class="nx">result</span><span class="o">:</span><span class="w"> </span><span class="nx">utils</span><span class="p">.</span><span class="nx">isPrime</span><span class="p">(</span><span class="nx">message</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span><span class="p">)});</span><span class="w"></span>
<span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">disconnect</span><span class="p">();</span><span class="w"></span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'[child %d] exiting'</span><span class="p">,</span><span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">pid</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">exit</span><span class="p">();</span><span class="w"></span>
<span class="p">});</span><span class="w"></span>
</pre></div>
<p>It waits for a message on its IPC channel, computes the prime-ness of the number
received, sends the reply and exits. Let's ignore the fact that it's wasteful to
launch a subprocess for each number, since the focus of this article is the
callbacks in the server. A more realistic application would have a pool of
"worker" processes that persist throughout the server's lifetime; this wouldn't
change much on the server side, however.</p>
<p>The important part to notice here is that we have a nested callback within the
server's <tt class="docutils literal">onConnData</tt>. The server's architecture is still quite simple - let's
see how it handles added complexity.</p>
</div>
<div class="section" id="adding-caching">
<h2>Adding caching</h2>
<p>Let's grossly over-engineer our silly primality testing server by adding a
cache. Not just any cache, but stored in Redis! How about that for a true child
of the 2010s? The point of this is educational, of course, so please bear with
me for a bit.</p>
<p>We assume a Redis server is running on the local host, listening on the default
port. We'll use the <tt class="docutils literal">redis</tt> package to talk to it; the <a class="reference external" href="https://github.com/eliben/code-for-blog/blob/master/2018/async-socket-server/primeserver-offload-caching.js">full code is here</a>,
but the interesting part is this:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">onConnData</span><span class="p">(</span><span class="nx">d</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">num</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">utils</span><span class="p">.</span><span class="nx">buf2num</span><span class="p">(</span><span class="nx">d</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'num %d'</span><span class="p">,</span><span class="w"> </span><span class="nx">num</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">cachekey</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'primecache:'</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">num</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="nx">redis_client</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">cachekey</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="nx">res</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">err</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'redis client error'</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">res</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="kc">null</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">worker</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">child_process</span><span class="p">.</span><span class="nx">fork</span><span class="p">(</span><span class="s1">'./primeworker.js'</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="nx">worker</span><span class="p">.</span><span class="nx">send</span><span class="p">(</span><span class="nx">num</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="nx">worker</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">'message'</span><span class="p">,</span><span class="w"> </span><span class="nx">message</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">answer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">message</span><span class="p">.</span><span class="nx">result</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="s1">'prime'</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s1">'composite'</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="nx">redis_client</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">cachekey</span><span class="p">,</span><span class="w"> </span><span class="nx">answer</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="nx">res</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">err</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'redis client error'</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">conn</span><span class="p">.</span><span class="nx">write</span><span class="p">(</span><span class="nx">answer</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">'\n'</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'... %d is %s'</span><span class="p">,</span><span class="w"> </span><span class="nx">num</span><span class="p">,</span><span class="w"> </span><span class="nx">answer</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">});</span><span class="w"></span>
<span class="w"> </span><span class="p">});</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="c1">// The strings 'prime' or 'composite' are stored in the Redis cache.</span><span class="w"></span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'cached num %d is %s'</span><span class="p">,</span><span class="w"> </span><span class="nx">num</span><span class="p">,</span><span class="w"> </span><span class="nx">res</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="nx">conn</span><span class="p">.</span><span class="nx">write</span><span class="p">(</span><span class="nx">res</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">'\n'</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">});</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</pre></div>
<p>Let's see what's going on. When a new number is received from the client, we
first check to see if it's already in the cache. This involves contacting the
Redis server, so naturally it has to be done asynchronously with a callback
registered for when the answer is ready. If the number is in the cache, we're
pretty much done.</p>
<p>If it's not, we have to spawn a worker to compute it; then, once the answer is
ready we want to write it to the cache. If the write is successful, we return
the answer <a class="footnote-reference" href="#footnote-3" id="footnote-reference-3">[3]</a>.</p>
</div>
<div class="section" id="callback-hell">
<h2>Callback hell</h2>
<p>Taking another look at the last code snippet, we see callbacks nested 3 layers
deep. That's inside <tt class="docutils literal">onConnData</tt>, which is itself a callback - so make it 4
layers deep. This style of code is so common and notorious in event-driven
programming that it has an epithet - "callback hell".</p>
<p>The problem is often visualized as this deep, deep callback nest, but IMHO
that's not the real issue. Callback nesting is just a syntactic convenience JS
makes particularly easy, so folks use it. If you look at the C code
<a class="reference external" href="https://eli.thegreenplace.net/2017/concurrent-servers-part-4-libuv">in part 4</a>, it has a
similar level of <em>logical</em> nesting, but since each function is standalone and
not a closure embedded in a surrounding function, it's less visually jarring.</p>
<p>The "just use standalone named functions" solution has issues too; closures
have their benefits - for example they easily refer to values from external
scopes. In the last code snippet, note how <tt class="docutils literal">num</tt> is used in several nested
callbacks but only defined inside <tt class="docutils literal">onConnData</tt> itself. Without this lexical
convenience we'd have to pass it explicitly through all the callbacks, and the
same for all other common values. It's not the end of the world, but it helps
explain why folks gravitate naturally to the tower of nested closures - it's
less code to type.</p>
<p>The bigger issue with this way of programming is forcing programmers into
<em>continuation passing style</em>. It's worth spending some time to explain what I
mean.</p>
<p>Traditional, "straight-line" code looks like the following:</p>
<div class="highlight"><pre><span></span>a <- run_w()
b <- run_x(a)
c <- run_y()
d <- run_z(b, c)
</pre></div>
<p>Let's assume that each of <tt class="docutils literal">run_*</tt> can potentially block, but it doesn't
concern us because we have our own thread or something. The flow of data here
is very straightforward. Now let's see how this would look using asynchronous
callbacks:</p>
<div class="highlight"><pre><span></span>run_w(a =>
run_x(a, b =>
run_y(c =>
run_z(b, c, ...))))
</pre></div>
<p>Nothing surprising, but note how much less obvious the flow of data is. Instead
of saying "run W and get me an <tt class="docutils literal">a</tt>", we have to say "run W and when <tt class="docutils literal">a</tt> is
ready, do ...". This is similar to continuations in programming language theory;
<a class="reference external" href="https://eli.thegreenplace.net/2017/on-recursion-continuations-and-trampolines/">I've written about continuations in the past</a>,
and it should be easy to find tons of other information online.</p>
<p>Continuation passing style is not bad per-se, but it makes it harder to keep
track of the data flow in the program. It's easier to think of functions as
taking values and returning values, as opposed to taking values and passing
their results forward to other functions <a class="footnote-reference" href="#footnote-4" id="footnote-reference-4">[4]</a>.</p>
<p>This problem is compounded when we consider error handling in realistic
programs. Back to the straight-line code sample - if <tt class="docutils literal">run_x</tt> encounters an
error, it returns it. The place where <tt class="docutils literal">run_x</tt> is called is precisely the right
place to handle this error, because this is the place that has the full context
for the call.</p>
<p>In the asynchronous variant, if <tt class="docutils literal">run_x</tt> encounters an error, there's no
natural place to "return" it to, because <tt class="docutils literal">run_x</tt> doesn't really return
anything. It feeds its result forward. Node.js has an idiom to support this
style of programming - <a class="reference external" href="http://fredkschott.com/post/2014/03/understanding-error-first-callbacks-in-node-js/">error-first callbacks</a>.</p>
<p>You might think that JS's exceptions should be able to help here, but exceptions
mix with callbacks even more poorly. The callback is usually invoked in a
completely different stack frame from the place where it's passed into an
operation. Therefore, there's no natural place to position <tt class="docutils literal">try</tt> blocks.</p>
</div>
<div class="section" id="promises">
<h2>Promises</h2>
<p>Even though the callback-programming style has some issues, they are by no means
fatal. After all, many successful projects were developed with Node.js, even
before the fancy new features became available in ES6 and beyond.</p>
<p>People have been well aware of the issues, however, and have worked hard
to create solutions or at least mitigations for the most serious problems. The
first such solution came to standard JS with ES6: <em>promises</em> (also known as
<em>futures</em> in other languages). However, long before becoming a standard,
promises were available as libraries. A <tt class="docutils literal">Promise</tt> object is really just
syntactic sugar around callbacks - it can be implemented as a library in pure
Javascript.</p>
<p>There are plenty of tutorials about promises online; I'll just focus on showing
how our over-engineered prime server looks when written with promises instead of
naked callbacks. Here's <tt class="docutils literal">onConnData</tt> in the <a class="reference external" href="https://github.com/eliben/code-for-blog/blob/master/2018/async-socket-server/primeserver-promises.js">promise-based version</a>:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">onConnData</span><span class="p">(</span><span class="nx">d</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">num</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">utils</span><span class="p">.</span><span class="nx">buf2num</span><span class="p">(</span><span class="nx">d</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'num %d'</span><span class="p">,</span><span class="w"> </span><span class="nx">num</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">cachekey</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'primecache:'</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">num</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="nx">redisGetAsync</span><span class="p">(</span><span class="nx">cachekey</span><span class="p">).</span><span class="nx">then</span><span class="p">(</span><span class="nx">res</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">res</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="kc">null</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">isPrimeAsync</span><span class="p">(</span><span class="nx">num</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'cached num %d is %s'</span><span class="p">,</span><span class="w"> </span><span class="nx">num</span><span class="p">,</span><span class="w"> </span><span class="nx">res</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">Promise</span><span class="p">.</span><span class="nx">resolve</span><span class="p">(</span><span class="nx">res</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}).</span><span class="nx">then</span><span class="p">(</span><span class="nx">res</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="c1">// Using Promise.all to pass 'res' from here to the next .then handler.</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">Promise</span><span class="p">.</span><span class="nx">all</span><span class="p">([</span><span class="nx">redisSetAsync</span><span class="p">(</span><span class="nx">cachekey</span><span class="p">,</span><span class="w"> </span><span class="nx">res</span><span class="p">),</span><span class="w"> </span><span class="nx">res</span><span class="p">]);</span><span class="w"></span>
<span class="w"> </span><span class="p">}).</span><span class="nx">then</span><span class="p">(([</span><span class="nx">set_result</span><span class="p">,</span><span class="w"> </span><span class="nx">computation_result</span><span class="p">])</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">conn</span><span class="p">.</span><span class="nx">write</span><span class="p">(</span><span class="nx">computation_result</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">'\n'</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}).</span><span class="k">catch</span><span class="p">(</span><span class="nx">err</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'error:'</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">});</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</pre></div>
<p>There are some missing pieces here. First, the promise-ready versions of the
Redis client are defined thus:</p>
<div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="nx">promisify</span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'util'</span><span class="p">);</span><span class="w"></span>
<span class="c1">// Create a Redis client. This connects to a Redis server running on the local</span><span class="w"></span>
<span class="c1">// machine at the default port.</span><span class="w"></span>
<span class="kd">var</span><span class="w"> </span><span class="nx">redis_client</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">redis</span><span class="p">.</span><span class="nx">createClient</span><span class="p">();</span><span class="w"></span>
<span class="kd">const</span><span class="w"> </span><span class="nx">redisGetAsync</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">promisify</span><span class="p">(</span><span class="nx">redis_client</span><span class="p">.</span><span class="nx">get</span><span class="p">).</span><span class="nx">bind</span><span class="p">(</span><span class="nx">redis_client</span><span class="p">);</span><span class="w"></span>
<span class="kd">const</span><span class="w"> </span><span class="nx">redisSetAsync</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">promisify</span><span class="p">(</span><span class="nx">redis_client</span><span class="p">.</span><span class="nx">set</span><span class="p">).</span><span class="nx">bind</span><span class="p">(</span><span class="nx">redis_client</span><span class="p">);</span><span class="w"></span>
</pre></div>
<p><tt class="docutils literal">promisify</tt> is a Node utility function that takes a callback-based function
and returns a promise-returning version. <tt class="docutils literal">isPrimeAsync</tt> is:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">isPrimeAsync</span><span class="p">(</span><span class="nx">n</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nb">Promise</span><span class="p">((</span><span class="nx">resolve</span><span class="p">,</span><span class="w"> </span><span class="nx">reject</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">child</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">child_process</span><span class="p">.</span><span class="nx">fork</span><span class="p">(</span><span class="s1">'./primeworker.js'</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="nx">child</span><span class="p">.</span><span class="nx">send</span><span class="p">(</span><span class="nx">n</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="nx">child</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">'message'</span><span class="p">,</span><span class="w"> </span><span class="nx">message</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">message</span><span class="p">.</span><span class="nx">result</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="s1">'prime'</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s1">'composite'</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="nx">resolve</span><span class="p">(</span><span class="nx">result</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">});</span><span class="w"></span>
<span class="w"> </span><span class="nx">child</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">'error'</span><span class="p">,</span><span class="w"> </span><span class="nx">message</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span><span class="nx">reject</span><span class="p">(</span><span class="nx">message</span><span class="p">)});</span><span class="w"></span>
<span class="w"> </span><span class="p">});</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</pre></div>
<p>Here the <tt class="docutils literal">Promise</tt> protocol is implemented manually. Instead of taking a
callback to be invoked when the result is ready (and another to be invoked for
errors), <tt class="docutils literal">isPrimeAsync</tt> returns a <tt class="docutils literal">Promise</tt> object wrapping a function. It
can then participate in a <tt class="docutils literal">then</tt> chain of <tt class="docutils literal">Promise</tt>s, as usual.</p>
<p>Now looking back at the main flow of <tt class="docutils literal">onConnData</tt>, some things become
apparent:</p>
<ol class="arabic simple">
<li>The nesting is <em>flattened</em>, turning into a chain of <tt class="docutils literal">then</tt> calls.</li>
<li>Errors can be handled in a single <tt class="docutils literal">catch</tt> at the end of the promise chain.
Programming language afficionados will be delighted to discover that in this
sense promises behave just like <a class="reference external" href="https://hackage.haskell.org/package/mtl-2.2.2/docs/Control-Monad-Cont.html">continuation monads in Haskell</a>.</li>
</ol>
<p>Choosing promises over the callback style is a matter of preference; what makes
promises really interesting, IMHO, is the next step - <tt class="docutils literal">await</tt>.</p>
</div>
<div class="section" id="async-and-await">
<h2>async and await</h2>
<p>With ES7, Javascript added support for the <tt class="docutils literal">async</tt> and <tt class="docutils literal">await</tt> keywords,
actually modifying the language for more convenient support of asynchronous
programming. Functions returning promises can now be marked as <tt class="docutils literal">async</tt>, and
invoking these functions can be done with <tt class="docutils literal">await</tt>. When a promise-returning
function is invoked with <tt class="docutils literal">await</tt>, what happens behind the scenes is exactly
the same as in the callback or promise versions - a callback is registered and
control is relinquished to the event loop. However, <tt class="docutils literal">await</tt> lets us express
this process in a very natural syntax that addresses some of the biggest issues
with callbacks and promises.</p>
<p><a class="reference external" href="https://github.com/eliben/code-for-blog/blob/master/2018/async-socket-server/primeserver-asyncawait.js">Here is our prime server again</a>,
now written with <tt class="docutils literal">await</tt>:</p>
<div class="highlight"><pre><span></span><span class="k">async</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">onConnData</span><span class="p">(</span><span class="nx">d</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">num</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">utils</span><span class="p">.</span><span class="nx">buf2num</span><span class="p">(</span><span class="nx">d</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'num %d'</span><span class="p">,</span><span class="w"> </span><span class="nx">num</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">cachekey</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'primecache:'</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">num</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">cached</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">await</span><span class="w"> </span><span class="nx">redisGetAsync</span><span class="p">(</span><span class="nx">cachekey</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">cached</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="kc">null</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">computed</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">await</span><span class="w"> </span><span class="nx">isPrimeAsync</span><span class="p">(</span><span class="nx">num</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="k">await</span><span class="w"> </span><span class="nx">redisSetAsync</span><span class="p">(</span><span class="nx">cachekey</span><span class="p">,</span><span class="w"> </span><span class="nx">computed</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="nx">conn</span><span class="p">.</span><span class="nx">write</span><span class="p">(</span><span class="nx">computed</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">'\n'</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'cached num %d is %s'</span><span class="p">,</span><span class="w"> </span><span class="nx">num</span><span class="p">,</span><span class="w"> </span><span class="nx">cached</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="nx">conn</span><span class="p">.</span><span class="nx">write</span><span class="p">(</span><span class="nx">cached</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">'\n'</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="p">(</span><span class="nx">err</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'error:'</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</pre></div>
<p>This reads just like a blocking version <a class="footnote-reference" href="#footnote-5" id="footnote-reference-5">[5]</a>, but in fact there is no blocking
here; for example, with this line:</p>
<div class="highlight"><pre><span></span><span class="kd">var</span><span class="w"> </span><span class="nx">cached</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">await</span><span class="w"> </span><span class="nx">redisGetAsync</span><span class="p">(</span><span class="nx">cachekey</span><span class="p">);</span><span class="w"></span>
</pre></div>
<p>A "get" request will be issued with the Redis client, and a callback will be
registered for when data is ready. Until it's ready, the event loop will be
free to do other work (like handle concurrent requests). Once it's ready and
the callback fires, the result is assigned into <tt class="docutils literal">cached</tt>. We no longer have to
split up our code into a tower of callbacks or a chain of <tt class="docutils literal">then</tt> clauses - we
can write it in a natural sequential order. We still have to be mindful of
blocking operations and be very careful about what is invoked inside callbacks,
but it's a big improvement regardless.</p>
</div>
<div class="section" id="conclusion">
<h2>Conclusion</h2>
<p>This post has been a whirlwind tour of some idioms of asynchronous programming,
adding modern abstractions on top of the bare-bones <tt class="docutils literal">libuv</tt> based servers
of part 4. This information should be sufficient to understand most asynchronous
code being written today.</p>
<p>A separate question is - is it worth it? Asynchronous code obviously brings with
it some unique programming challenges. Is this the best way to handle high-load
concurrency? I'm keenly interested in the comparison of this model of
programming with the more "traditional" thread-based model, but this is a large
topic I'll have to defer to a future post.</p>
<hr class="docutils" />
<table class="docutils footnote" frame="void" id="footnote-1" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#footnote-reference-1">[1]</a></td><td><p class="first">The main value proposition of Node.js is using the same language on the
server and on the client. Client-side programmers are already familiar
with JS, by necessity, so not having to learn another language to program
server-side is a plus.</p>
<p class="last">Interestingly, this choice also affects the fundamental architecture and
"way" of Node.js; since JS is a single-threaded language, Node.js adopted
this model and had to turn to asynchronous APIs to support concurrency.
In fact, the <tt class="docutils literal">libuv</tt> framework we covered in part 4 was developed as
a portability layer to support Node.js.</p>
</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="footnote-2" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#footnote-reference-2">[2]</a></td><td><p class="first">Since the idea is emulate CPU-intensive work, this is just a hack to
avoid using huge primes as inputs. For anything but very large primes,
even this naive algorithm executes extremely quickly so it's hard to
see real delays.</p>
<p class="last">Since Node.js doesn't have a <tt class="docutils literal">sleep</tt> function (the idea of <tt class="docutils literal">sleep</tt> is
contrary to the philosophy of Node.js), we simulate it here with a busy
loop checking the time. The important bit is to keep the CPU occupied,
emulating and intensive computation.</p>
</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="footnote-3" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#footnote-reference-3">[3]</a></td><td>Note that we don't strictly have to wait for the cache write to complete
before returning the answer, but this results in the cleanest protocol
since it gives us a natural place to return errors.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="footnote-4" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#footnote-reference-4">[4]</a></td><td>There's much more to say on the relative merits of synchronous vs.
callback-based programming, but I'll leave it to another time.</td></tr>
</tbody>
</table>
<table class="docutils footnote" frame="void" id="footnote-5" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#footnote-reference-5">[5]</a></td><td>I wrote a blocking version of this exact server in Python, using a thread
pool for concurrency; <a class="reference external" href="https://github.com/eliben/code-for-blog/blob/master/2018/async-socket-server/primeserver-py-blocking.py">the full code is here</a>.
Feel free to compare the <tt class="docutils literal">await</tt> based <tt class="docutils literal">onConnData</tt> with the
<tt class="docutils literal">handle_client_data</tt> function. I switched to Python for this task
because writing blocking code in Node.js is a bit like pissing against
the wind.</td></tr>
</tbody>
</table>
</div>
Go WebSocket server sample2016-05-03T05:23:00-07:002023-02-04T13:41:52-08:00Eli Benderskytag:eli.thegreenplace.net,2016-05-03:/2016/go-websocket-server-sample/<p>I posted a small sample of a <a class="reference external" href="https://github.com/eliben/code-for-blog/tree/master/2016/go-websocket-sample">WebSocket server written in Go on GitHub</a>.</p>
<p>The sample uses JS to record some mouse events in a page and send them as JSON
data over a WebSocket connection to a server written in Go, which echoes them
back. The received events are …</p><p>I posted a small sample of a <a class="reference external" href="https://github.com/eliben/code-for-blog/tree/master/2016/go-websocket-sample">WebSocket server written in Go on GitHub</a>.</p>
<p>The sample uses JS to record some mouse events in a page and send them as JSON
data over a WebSocket connection to a server written in Go, which echoes them
back. The received events are used by the JS code to update the page. In
addition, the server periodically sends time updates to the client over another
WebSocket.</p>
<img alt="Gopher and WebSocket logo" class="align-center" src="https://eli.thegreenplace.net/images/2016/gopher-ws.png" />
<p>The sample demonstrates how to do several things I was curious about:</p>
<ul class="simple">
<li>Talking WebSockets in Go. I'm using the semi-standard <a class="reference external" href="https://godoc.org/golang.org/x/net/websocket">x/net/websocket</a> package for this purpose.
The sample has a WebSocket server as well as a Go client for testing it.</li>
<li>Serving both static pages and other HTTP traffic on the same connection.</li>
<li>Using JSON for marshalling and unmarshalling of data on the Go side.</li>
<li>Implementing both bi-directional WebSocket communication (client
initiates, server replies) and uni-directional push notifications (server
pushes to client without polling).</li>
<li>Using the <a class="reference external" href="https://godoc.org/golang.org/x/net/trace">trace</a> package
for recording server request analytics and reporting them through HTTP.</li>
<li>Writing a simple WebSocket client in JS.</li>
</ul>
<p>The client-side is just a page of pure JS (no frameworks). I believe it should
work with all modern browsers (I tried in fairly recent versions of Chrome and
Firefox).</p>
<p>One thing I was particularly interested in is how framing (the creation of
frames from a raw data stream) over WebSockets is done. I've written a bit about
framing before: <a class="reference external" href="https://eli.thegreenplace.net/2009/08/12/framing-in-serial-communications">in serial communications</a>
(also <a class="reference external" href="https://eli.thegreenplace.net/2009/08/20/frames-and-protocols-for-the-serial-port-in-python">here</a>),
and <a class="reference external" href="https://eli.thegreenplace.net/2011/08/02/length-prefix-framing-for-protocol-buffers">length-prefixing for protocol buffers</a>.</p>
<p>WebSockets run over TCP so we don't have to worry about lower-level headaches.
All bytes sent will arrive, in the right order. The <a class="reference external" href="https://tools.ietf.org/html/rfc6455">WebSocket RFC</a> defines a precise frame structure, which
is usually implemented in libraries; clients only have to worry about the
payloads.</p>
<p>For example, on the Go side this is implemented in <a class="reference external" href="https://github.com/golang/net/blob/master/websocket/hybi.go">hybi.go</a> (look for the
<tt class="docutils literal">Write</tt> method on the <tt class="docutils literal">hybiFrameWriter</tt> type). What the user of the library
ends up getting is just a <tt class="docutils literal">[]byte</tt> interface to pass in and out of the WebSocket
layer. This is abstracted with a <tt class="docutils literal">Codec</tt> type:</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">Codec</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="nx">Marshal</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">v</span><span class="w"> </span><span class="kd">interface</span><span class="p">{})</span><span class="w"> </span><span class="p">(</span><span class="nx">data</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">payloadType</span><span class="w"> </span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="nx">Unmarshal</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">data</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">payloadType</span><span class="w"> </span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">v</span><span class="w"> </span><span class="kd">interface</span><span class="p">{})</span><span class="w"> </span><span class="p">(</span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</pre></div>
<p>The <tt class="docutils literal">x/net/websocket</tt> library provides some default <tt class="docutils literal">Codec</tt>s like
<tt class="docutils literal">Message</tt> (for <tt class="docutils literal">[]byte</tt> and <tt class="docutils literal">string</tt>) and <tt class="docutils literal">JSON</tt> (for JSON-encoded
data), but the user can provide his own. For example, it's fairly easy to send
protocol-buffer encoded data over WebSockets if you're so inclined.</p>
Rewriting the lexer benchmark in Go2014-03-27T05:58:29-07:002022-10-04T14:08:24-07:00Eli Benderskytag:eli.thegreenplace.net,2014-03-27:/2014/03/27/rewriting-the-lexer-benchmark-in-go
<p>Last year I was toying with a simple lexer (for the <a class="reference external" href="http://llvm.org/docs/TableGen/">TableGen language</a>, because why not), implementing it using <a class="reference external" href="https://eli.thegreenplace.net/2013/07/16/hand-written-lexer-in-javascript-compared-to-the-regex-based-ones/">multiple approaches</a> in both <a class="reference external" href="https://eli.thegreenplace.net/2013/06/25/regex-based-lexical-analysis-in-python-and-javascript/">Python and Javascript</a>. Redoing the same task using multiple approaches and using more than one language is a very interesting code kata and a great way to …</p>
<p>Last year I was toying with a simple lexer (for the <a class="reference external" href="http://llvm.org/docs/TableGen/">TableGen language</a>, because why not), implementing it using <a class="reference external" href="https://eli.thegreenplace.net/2013/07/16/hand-written-lexer-in-javascript-compared-to-the-regex-based-ones/">multiple approaches</a> in both <a class="reference external" href="https://eli.thegreenplace.net/2013/06/25/regex-based-lexical-analysis-in-python-and-javascript/">Python and Javascript</a>. Redoing the same task using multiple approaches and using more than one language is a very interesting code kata and a great way to learn.</p>
<p>Since I've been recently looking at Go, I continued the exercise by reimplementing the lexer (the hand-written one, not a regex-based) in Go.
The full code is
<a class="reference external" href="https://github.com/eliben/code-for-blog/tree/master/2014/tablegen-lexer-go">available here</a> (along with a large input file used for benchmarking).</p>
<p>Naturally, since my previous posts did performance comparisons between Python and Javascript, I wanted to add Go to the graph. I also had to rerun all the benchmarks because from the time of writing those posts I got a <a class="reference external" href="https://eli.thegreenplace.net/2013/11/23/a-new-ubuntu-machine-for-home/">new, much faster, machine</a>.</p>
<p>Anyway, here it is:</p>
<img class="align-center" src="https://eli.thegreenplace.net/images/2014/03/comparison-py-js-go.png" />
<p>Since Python is so slow here, it's hard to see the difference between the fastest versions, but the handwritten Go lexer is roughly on par with the Javascript one (33 vs. 31 msec). The benchmarks were run on my i7-4771 machine (amd64); go1.2.1, Node.js v0.10.26.</p>
<p>Now, this is quite literally the first non-trivial Go program I've written and I'm a neophyte by all measures, so any tips on the code would be very welcome. I tried to stay faithful to the Javascript implementation in terms of the algorithm, so the comparison would be fair.</p>
<p>That said, shortly after completing the code I started wondering if it could be made faster. There's something about Go that makes you think about performance on a low level, not unlike when programming in C. Maybe it's because so many things are explicit - pointers, slices, etc.</p>
<p>Anyhow, the code that uses a lexer to fill in a slice of tokens caught my eye:</p>
<div class="highlight" style="background: #ffffff"><pre style="line-height: 125%">toks := []Token{}
startTime := time.Now()
<span style="color: #00007f; font-weight: bold">for</span> {
nt := nl.NextToken()
toks = append(toks, nt)
<span style="color: #00007f; font-weight: bold">if</span> nt.Name == EOF {
<span style="color: #00007f; font-weight: bold">break</span>
}
}
</pre></div>
<p>That <tt class="docutils literal">toks = append(toks, nt)</tt> in particular. As the size grows, <tt class="docutils literal">toks</tt> will have to be reallocated and all its contents copied over. Since the input in my case had close to 200000 tokens and reallocation doubles the slice size, this means that in the order of 16 reallocations have to happen here, each time copying all the elements over. If that sounds like a lot of wasted work to you, that's because it is.</p>
<p>So I tried replacing the first line with:</p>
<div class="highlight" style="background: #ffffff"><pre style="line-height: 125%">toks := <span style="color: #00007f">make</span>([]Token, <span style="color: #007f7f">0</span>, <span style="color: #007f7f">200000</span>)
</pre></div>
<p>And wow, the runtime dropped from 33 to 20 ms, making it 33% faster than the JS version. To be fair to JS I tried to perform the same optimization there (instead of pushing onto an array, create a large one in advance), but this has actually made things <em>slower</em>. <a class="reference external" href="https://thewayofcode.wordpress.com/tag/array-pre-allocation/">Some sources online</a> claim that V8 (which is what I'm running underneath, since my local code runs on Node.js) doesn't like preallocating large arrays.</p>
<p>So as is often the case with benchmarks, it's difficult to do an apples-to-apples comparison here. A hunch tells me that in a fully optimized (by a programmer skilled in the language) version of this benchmark, Go would still win, because its nature (typed, compiled to native code, and exposing a lot of low-level details) make it easier to reason about in terms of performance. But performance was not really the point here - I just wanted to see how easy it is to reimplement the same lexer in Go.</p>
<p>Hopefully the code would be useful/interesting to someone; please let me know what I could've done better.</p>
<p>
<b>Update (2022-05-03):</b> A newer version of Go and some additional
optimizations make this lexer more than 3x faster.
See <a href="https://eli.thegreenplace.net/2022/a-faster-lexer-in-go/">details in this post</a>.
</p>
JavaScript (ES 5) hack for clean multi-line strings2013-11-09T15:40:10-08:002022-10-04T14:08:24-07:00Eli Benderskytag:eli.thegreenplace.net,2013-11-09:/2013/11/09/javascript-es-5-hack-for-clean-multi-line-strings
<p>JavaScript lacks convenient syntax for multiline strings (the equivalent of Python's triple-quotes or Perl's "here blocks"), unless you consider this convenient:</p>
<div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span style="color: #00007f; font-weight: bold">var</span> s = <span style="color: #7f007f">"\</span>
<span style="color: #7f007f">line one\n\</span>
<span style="color: #7f007f">line two\n\</span>
<span style="color: #7f007f">line 'three'\n"</span>;
</pre></div>
<p>This is something ECMAScript 6 is rumored to support (along with other pink fairies and unicorns), once it …</p>
<p>JavaScript lacks convenient syntax for multiline strings (the equivalent of Python's triple-quotes or Perl's "here blocks"), unless you consider this convenient:</p>
<div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span style="color: #00007f; font-weight: bold">var</span> s = <span style="color: #7f007f">"\</span>
<span style="color: #7f007f">line one\n\</span>
<span style="color: #7f007f">line two\n\</span>
<span style="color: #7f007f">line 'three'\n"</span>;
</pre></div>
<p>This is something ECMAScript 6 is rumored to support (along with other pink fairies and unicorns), once it gets published and adopted. But in the meantime, intrepid JavaScript programmers are left out in the dark. Unless you <em>really</em> need this to preserve sanity and are willing to resort to unconventional methods.</p>
<img class="align-center" src="https://eli.thegreenplace.net/images/2013/11/meme-hack.jpg" />
<div class="highlight" style="background: #ffffff"><pre style="line-height: 125%"><span style="color: #00007f; font-weight: bold">var</span> MultiString = <span style="color: #00007f; font-weight: bold">function</span>(f) {
<span style="color: #00007f; font-weight: bold">return</span> f.toString().split(<span style="color: #7f007f">'\n'</span>).slice(<span style="color: #007f7f">1</span>, -<span style="color: #007f7f">1</span>).join(<span style="color: #7f007f">'\n'</span>);
}
<span style="color: #00007f; font-weight: bold">var</span> ms = MultiString(<span style="color: #00007f; font-weight: bold">function</span>() {<span style="color: #007f00">/**</span>
<span style="color: #007f00">line one</span>
<span style="color: #007f00">line two</span>
<span style="color: #007f00">line 'three'</span>
<span style="color: #007f00">**/</span>});
</pre></div>
<p>Yes, it's as horrible as it looks. And yes, it's sometimes convenient. Naturally for a couple of 3-line strings I probably wouldn't bother. But when you need to cleanly embed multiple long multi-line strings (templating, anyone?) in your source code, I find this pretty useful.</p>