<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Eli Bendersky's website - Go</title><link href="https://eli.thegreenplace.net/" rel="alternate"></link><link href="https://eli.thegreenplace.net/feeds/go.atom.xml" rel="self"></link><id>https://eli.thegreenplace.net/</id><updated>2026-04-10T02:28:00-07:00</updated><entry><title>watgo - a WebAssembly Toolkit for Go</title><link href="https://eli.thegreenplace.net/2026/watgo-a-webassembly-toolkit-for-go/" rel="alternate"></link><published>2026-04-09T19:28:00-07:00</published><updated>2026-04-10T02:28:00-07:00</updated><author><name>Eli Bendersky</name></author><id>tag:eli.thegreenplace.net,2026-04-09:/2026/watgo-a-webassembly-toolkit-for-go/</id><summary type="html">&lt;p&gt;I'm happy to announce the general availability of &lt;a class="reference external" href="https://github.com/eliben/watgo"&gt;watgo&lt;/a&gt;
- the &lt;strong&gt;W&lt;/strong&gt;eb&lt;strong&gt;A&lt;/strong&gt;ssembly &lt;strong&gt;T&lt;/strong&gt;oolkit for &lt;strong&gt;G&lt;/strong&gt;o. This project is similar to
&lt;a class="reference external" href="https://github.com/webassembly/wabt"&gt;wabt&lt;/a&gt; (C++) or
&lt;a class="reference external" href="https://github.com/bytecodealliance/wasm-tools"&gt;wasm-tools&lt;/a&gt; (Rust), but in
pure, zero-dependency Go.&lt;/p&gt;
&lt;p&gt;watgo comes with a CLI and a Go API to parse WAT (WebAssembly Text), validate
it …&lt;/p&gt;</summary><content type="html">&lt;p&gt;I'm happy to announce the general availability of &lt;a class="reference external" href="https://github.com/eliben/watgo"&gt;watgo&lt;/a&gt;
- the &lt;strong&gt;W&lt;/strong&gt;eb&lt;strong&gt;A&lt;/strong&gt;ssembly &lt;strong&gt;T&lt;/strong&gt;oolkit for &lt;strong&gt;G&lt;/strong&gt;o. This project is similar to
&lt;a class="reference external" href="https://github.com/webassembly/wabt"&gt;wabt&lt;/a&gt; (C++) or
&lt;a class="reference external" href="https://github.com/bytecodealliance/wasm-tools"&gt;wasm-tools&lt;/a&gt; (Rust), but in
pure, zero-dependency Go.&lt;/p&gt;
&lt;p&gt;watgo comes with a CLI and a Go API to parse WAT (WebAssembly Text), validate
it, and encode it into WASM binaries; it also supports decoding WASM from its
binary format.&lt;/p&gt;
&lt;p&gt;At the center of it all is &lt;a class="reference external" href="https://pkg.go.dev/github.com/eliben/watgo/wasmir"&gt;wasmir&lt;/a&gt; - a semantic
representation of a WebAssembly module that users can examine (and manipulate).
This diagram shows the functionalities provided by watgo:&lt;/p&gt;
&lt;img alt="Block diagram showing the different parts of watgo; described in the next paragraph" class="align-center" src="https://eli.thegreenplace.net/images/2026/watgo-diagram.png" /&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Parse: a parser from WAT to &lt;tt class="docutils literal"&gt;wasmir&lt;/tt&gt;&lt;/li&gt;
&lt;li&gt;Validate: uses the official WebAssembly validation semantics to check that the
module is well formed and safe&lt;/li&gt;
&lt;li&gt;Encode: emits &lt;tt class="docutils literal"&gt;wasmir&lt;/tt&gt; into WASM binary representation&lt;/li&gt;
&lt;li&gt;Decode: read WASM binary representation into &lt;tt class="docutils literal"&gt;wasmir&lt;/tt&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="section" id="cli-use-case"&gt;
&lt;h2&gt;CLI use case&lt;/h2&gt;
&lt;p&gt;watgo comes with a CLI, which you can install by issuing this command:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;go install github.com/eliben/watgo/cmd/watgo@latest
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The CLI aims to be compatible with wasm-tools &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt;, and I've already switched my
&lt;a class="reference external" href="https://github.com/eliben/wasm-wat-samples"&gt;wasm-wat-samples&lt;/a&gt; projects to
use it; e.g. a command to parse a WAT file, validate it and encode it into
binary format:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;watgo parse stack.wat -o stack.wasm
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="api-use-case"&gt;
&lt;h2&gt;API use case&lt;/h2&gt;
&lt;p&gt;&lt;tt class="docutils literal"&gt;wasmir&lt;/tt&gt; semantically represents a WASM module with an API that's easy to work
with. Here's an example of using watgo to parse a simple WAT
program and do some analysis:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;package&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;fmt&amp;quot;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;github.com/eliben/watgo&amp;quot;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;github.com/eliben/watgo/wasmir&amp;quot;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;wasmText&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;`&lt;/span&gt;
&lt;span class="s"&gt;(module&lt;/span&gt;
&lt;span class="s"&gt;  (func (export &amp;quot;add&amp;quot;) (param i32 i32) (result i32)&lt;/span&gt;
&lt;span class="s"&gt;    local.get 0&lt;/span&gt;
&lt;span class="s"&gt;    local.get 1&lt;/span&gt;
&lt;span class="s"&gt;    i32.add&lt;/span&gt;
&lt;span class="s"&gt;  )&lt;/span&gt;
&lt;span class="s"&gt;  (func (param f32 i32) (result i32)&lt;/span&gt;
&lt;span class="s"&gt;    local.get 1&lt;/span&gt;
&lt;span class="s"&gt;    i32.const 1&lt;/span&gt;
&lt;span class="s"&gt;    i32.add&lt;/span&gt;
&lt;span class="s"&gt;    drop&lt;/span&gt;
&lt;span class="s"&gt;    i32.const 0&lt;/span&gt;
&lt;span class="s"&gt;  )&lt;/span&gt;
&lt;span class="s"&gt;)`&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;watgo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ParseWAT&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="nb"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;wasmText&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nb"&gt;panic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;i32Params&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;localGets&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;i32Adds&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// Module-defined functions carry a type index into m.Types. The function&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// body itself is a flat sequence of wasmir.Instruction values.&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;fn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;range&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Funcs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;sig&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Types&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;TypeIdx&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;param&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;range&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;sig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Params&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;param&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Kind&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;wasmir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ValueKindI32&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nx"&gt;i32Params&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;instr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;range&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Body&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="k"&gt;switch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;instr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Kind&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;wasmir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;InstrLocalGet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nx"&gt;localGets&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;wasmir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;InstrI32Add&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nx"&gt;i32Adds&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;module-defined funcs: %d\n&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Funcs&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;i32 params: %d\n&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;i32Params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;local.get instructions: %d\n&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;localGets&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;i32.add instructions: %d\n&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;i32Adds&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;One important note: the WAT format supports several syntactic niceties that
are flattened / canonicalized when lowered to &lt;tt class="docutils literal"&gt;wasmir&lt;/tt&gt;. For example, all folded
instructions are lowered to unfolded ones (linear form), function &amp;amp; type
names are resolved to numeric indices, etc. This matches the validation and
execution semantics of WASM and its binary representation.&lt;/p&gt;
&lt;p&gt;These syntactic details are present in watgo in the &lt;tt class="docutils literal"&gt;textformat&lt;/tt&gt; package
(which parses WAT into an AST) and are removed when this is lowered to &lt;tt class="docutils literal"&gt;wasmir&lt;/tt&gt;.
The &lt;tt class="docutils literal"&gt;textformat&lt;/tt&gt; package is kept internal at this time, but in the future I
may consider exposing it publicly - if there's interest.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="testing-strategy"&gt;
&lt;h2&gt;Testing strategy&lt;/h2&gt;
&lt;p&gt;Even though it's still early days for watgo, I'm reasonably confident in its
correctness due to a strategy of very heavy testing right from the start.&lt;/p&gt;
&lt;p&gt;WebAssembly comes with a &lt;a class="reference external" href="https://github.com/WebAssembly/spec/"&gt;large official test suite&lt;/a&gt;,
which is perfect for end-to-end testing of new implementations.
The core test suite includes almost 200K lines of WAT files that carry several
modules with expected execution semantics and a variety of error scenarios
exercised. These live in specially designed &lt;a class="reference external" href="https://github.com/WebAssembly/spec/tree/main/interpreter#scripts"&gt;.wast files&lt;/a&gt; and
leverage a custom spec interpreter.&lt;/p&gt;
&lt;p&gt;watgo hijacks this approach by using the official test suite for its own
testing. A custom harness parses .wast files and uses watgo to convert the WAT
in them to binary WASM, which is then executed by Node.js &lt;a class="footnote-reference" href="#footnote-2" id="footnote-reference-2"&gt;[2]&lt;/a&gt;; this harness is
a significant effort in itself, but it's very much worth it - the result is
excellent testing coverage. watgo passes the entire WASM spec core test suite.&lt;/p&gt;
&lt;p&gt;Similarly, we leverage &lt;a class="reference external" href="https://github.com/WebAssembly/wabt/tree/main/test/interp"&gt;wabt's interp test suite&lt;/a&gt; which also
includes end-to-end tests, using a simpler Node-based harness to test them
against watgo.&lt;/p&gt;
&lt;p&gt;Finally, I maintain a collection of realistic program samples written in
WAT in the &lt;a class="reference external" href="https://github.com/eliben/wasm-wat-samples"&gt;wasm-wat-samples repository&lt;/a&gt;;
these are also used by watgo to test itself.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;Though not all of wasm-tools's functionality is supported yet.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-2" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-2"&gt;[2]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;To stick to a pure-Go approach also for testing, I originally tried
using wazero for this, but had to give up because wazero doesn't support
some of the recent WASM proposals that have already made it into the
standard (most notably Garbage Collection).&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
</content><category term="misc"></category><category term="Go"></category><category term="WebAssembly"></category><category term="Compilation"></category></entry><entry><title>Consistent hashing</title><link href="https://eli.thegreenplace.net/2025/consistent-hashing/" rel="alternate"></link><published>2025-09-27T05:54:00-07:00</published><updated>2025-09-28T13:13:02-07:00</updated><author><name>Eli Bendersky</name></author><id>tag:eli.thegreenplace.net,2025-09-27:/2025/consistent-hashing/</id><summary type="html">&lt;p&gt;This post is an introduction to &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Consistent_hashing"&gt;consistent hashing&lt;/a&gt;,
an algorithm for designing a hash table such that only a small portion of
keys has to be recomputed when the table's size changes.&lt;/p&gt;
&lt;div class="section" id="motivating-use-case"&gt;
&lt;h2&gt;Motivating use case&lt;/h2&gt;
&lt;p&gt;Suppose we're designing a &lt;a class="reference external" href="https://eli.thegreenplace.net/2022/go-and-proxy-servers-part-1-http-proxies/"&gt;caching web proxy&lt;/a&gt;,
but the expected storage demands are higher than …&lt;/p&gt;&lt;/div&gt;</summary><content type="html">&lt;p&gt;This post is an introduction to &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Consistent_hashing"&gt;consistent hashing&lt;/a&gt;,
an algorithm for designing a hash table such that only a small portion of
keys has to be recomputed when the table's size changes.&lt;/p&gt;
&lt;div class="section" id="motivating-use-case"&gt;
&lt;h2&gt;Motivating use case&lt;/h2&gt;
&lt;p&gt;Suppose we're designing a &lt;a class="reference external" href="https://eli.thegreenplace.net/2022/go-and-proxy-servers-part-1-http-proxies/"&gt;caching web proxy&lt;/a&gt;,
but the expected storage demands are higher than what a single machine can
handle. So we distribute the cache across multiple machines. How do we do that?
Given a URL, how do we make sure that we can easily find out which server we
should approach for a potentially cached version &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt;?&lt;/p&gt;
&lt;p&gt;An approach that immediately comes to mind is &lt;em&gt;hashing&lt;/em&gt;. Let's calculate a
numeric hash of the URL and distribute it evenly between N nodes (that's what
we'll call the servers in this post):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;hash := calculateHashFunction(url)
nodeId := hash % N
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This process works but turns out to have serious downsides in real-world
applications.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="the-problem-with-the-naive-hashing-approach"&gt;
&lt;h2&gt;The problem with the naive hashing approach&lt;/h2&gt;
&lt;p&gt;Consider our caching use case again; in a realistic application at &amp;quot;internet
scale&amp;quot;, one of the assumptions we made implicitly doesn't hold - the cache
nodes are not static. New nodes are added to the system if the load is
high (or if new machines come into service); existing nodes can crash or
be taken offline for maintenance. In other words, the number &lt;tt class="docutils literal"&gt;N&lt;/tt&gt; in our
application is not a constant.&lt;/p&gt;
&lt;p&gt;The problem may be apparent now; to demonstrate it directly, let's take an
actual implementation of &lt;tt class="docutils literal"&gt;hashItem&lt;/tt&gt; using Go's &lt;tt class="docutils literal"&gt;md5&lt;/tt&gt; package &lt;a class="footnote-reference" href="#footnote-2" id="footnote-reference-2"&gt;[2]&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;// hashItem computes the slot an item hashes to, given a total number of slots.&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;hashItem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;nslots&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint64&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;digest&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;md5&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Sum&lt;/span&gt;&lt;span class="p"&gt;([]&lt;/span&gt;&lt;span class="nb"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;digestHigh&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;binary&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;BigEndian&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Uint64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;digestLow&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;binary&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;BigEndian&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Uint64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;digestHigh&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;^&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;digestLow&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;nslots&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The terminology is slightly adjusted:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Instead of &lt;tt class="docutils literal"&gt;url&lt;/tt&gt;, we'll just refer to a generic &lt;tt class="docutils literal"&gt;item&lt;/tt&gt;&lt;/li&gt;
&lt;li&gt;&amp;quot;Slot&amp;quot; is a common concept in hash tables: our &lt;tt class="docutils literal"&gt;hashItem&lt;/tt&gt; computes a slot
number for an item, given the total number of available slots&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let's say we started with 32 slots, and we hashed the strings &lt;tt class="docutils literal"&gt;&amp;quot;hello&amp;quot;&lt;/tt&gt;,
&lt;tt class="docutils literal"&gt;&amp;quot;consistent&amp;quot;&lt;/tt&gt; and &lt;tt class="docutils literal"&gt;&amp;quot;marmot&amp;quot;&lt;/tt&gt;. We get these slots:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;hello       (n=32): 4
consistent  (n=32): 14
marmot      (n=32): 5
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now suppose that another node is added, and the total &lt;tt class="docutils literal"&gt;nslots&lt;/tt&gt; grows to 33.
Hashing our items again:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;hello       (n=33): 23
consistent  (n=33): 18
marmot      (n=33): 31
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;All the slots changed!&lt;/p&gt;
&lt;p&gt;This is a significant problem with the naive hashing approach. Whenever
&lt;tt class="docutils literal"&gt;nslots&lt;/tt&gt; changes, we get completely different slots for pretty much any item.
In a realistic application it means that whenever a new node joins or leaves
our caching cluster, there will be a flood of cache misses on every query until
the new cluster settles. And node changes sometimes occur at the most
incovenient times; imagine that the load is spiking (maybe a site was mentioned
in a high-profile news outlet, or there's a live event streaming) and new
nodes are added to handle it. This isn't a great time to temporarily lose all
caching!&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="consistent-hashing-1"&gt;
&lt;h2&gt;Consistent hashing&lt;/h2&gt;
&lt;p&gt;The consistent hashing algorithm solves the problem
in an elegant way. The key idea is to map both nodes and items onto
an interval, and then an item belongs to a node closest to it. Concretely,
we take the unit circle &lt;a class="footnote-reference" href="#footnote-3" id="footnote-reference-3"&gt;[3]&lt;/a&gt;, and map nodes and items to angles on this circle.
Here's an example that explains how this method works in more detail:&lt;/p&gt;
&lt;img alt="Circle showing consistent hashing in action" class="align-center" src="https://eli.thegreenplace.net/images/2025/consistent-hashing-circle.png" /&gt;
&lt;p&gt;This shows five nodes: N1 through N5, and three items: Ix, Iy, Iz.
Initially, we add the nodes: using a hashing operation we map them onto the
circle (details later). Then, as items come in, we determine which node they
belong to, as follows:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Use the same hashing operation to find the item's location on the circle&lt;/li&gt;
&lt;li&gt;The node this item belongs to is the closest one, in the clockwise direction&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In our diagram, Ix is mapped to N1, Iy to N2, and Iz to N3. So far so good, but
the benefit of this approach becomes apparent when the nodes change. In our
diagram, suppose N3 is removed. Then Iz will map to N5.
&lt;strong&gt;The mapping of the other items doesn't change!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Adding nodes has a similar outcome. If a new node N6 is added and it hashes to a
position between Iy and N2 on the circle, from that moment Iy will be mapped to
N6, but the other items keep their mapping.&lt;/p&gt;
&lt;p&gt;Suppose we have a total of &lt;em&gt;M&lt;/em&gt; items that we need to distribute across &lt;em&gt;N&lt;/em&gt;
nodes. Using the naive hashing approach, whenever we add or remove a node, all
&lt;em&gt;M&lt;/em&gt; items change their mapping. On the other hand, with consistent hashing only
about &lt;object class="valign-m6" data="https://eli.thegreenplace.net/images/math/ab937f504862098986b70c147198d7de0636b19d.svg" style="height: 22px;" type="image/svg+xml"&gt;\frac{M}{N}&lt;/object&gt; need to change. This is a huge difference.&lt;/p&gt;
&lt;p&gt;The original consistent hashing paper (see &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-4"&gt;[1]&lt;/a&gt;) calls this the &lt;em&gt;monotonicity
property&lt;/em&gt; of the algorithm:&lt;/p&gt;
&lt;blockquote&gt;
If items are initially assigned to a set of buckets &lt;object class="valign-m3" data="https://eli.thegreenplace.net/images/math/5f90004c9690d47aa17405ea93d48dd64a2584fd.svg" style="height: 15px;" type="image/svg+xml"&gt;V_1&lt;/object&gt;, and then
some new buckets are added to form &lt;object class="valign-m3" data="https://eli.thegreenplace.net/images/math/1de7baa05bc432a1a68efd5d6a7e7606da511451.svg" style="height: 15px;" type="image/svg+xml"&gt;V_2&lt;/object&gt;, then an item may move from
an old bucket to a new bucket, but not from one old bucket to another.&lt;/blockquote&gt;
&lt;/div&gt;
&lt;div class="section" id="implementing-consistent-hashing"&gt;
&lt;h2&gt;Implementing consistent hashing&lt;/h2&gt;
&lt;p&gt;Implementing the consistent hashing algorithm as described above is fairly easy.
The most critical part of the implementation is finding which node an item maps
to - this involves some kind of search. The original consistent hashing paper
suggests using a balanced binary tree for the search; the implementation I'm
demonstrating here uses a slightly different but equivalent approach:
binary search in a linear array of node positions (slots) &lt;a class="footnote-reference" href="#footnote-4" id="footnote-reference-5"&gt;[4]&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;First, some practical considerations:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Theoretically, the unit circle can be seen as the continuous range
&lt;tt class="docutils literal"&gt;[0, 1)&lt;/tt&gt;. In programming we much prefer the discrete domain, however,
so we're going to &amp;quot;quantize&amp;quot; this range to &lt;tt class="docutils literal"&gt;[0, ringSize)&lt;/tt&gt;, where
&lt;tt class="docutils literal"&gt;ringSize&lt;/tt&gt; is some suitably large number that avoids collisions.&lt;/li&gt;
&lt;li&gt;Looking at the circle diagram above, imagine that 0 degrees is the &amp;quot;north&amp;quot;
direction (12 o'clock), and angles increase clockwise. In our discrete
domain, 12 o'clock is 0, 3 o'clock is &lt;tt class="docutils literal"&gt;ringSize/4&lt;/tt&gt;, and so on.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When a node is added to the consistent hash, its location is found by applying
a hash function like &lt;tt class="docutils literal"&gt;hashItem&lt;/tt&gt; as described above, with
&lt;tt class="docutils literal"&gt;nslots=ringSize&lt;/tt&gt;. The nodes are stored using a pair of data structures,
as follows; this example uses the approximate locations of the nodes N1
through N5 in the circle diagram above (assume &lt;tt class="docutils literal"&gt;ringSize=1024&lt;/tt&gt; here):&lt;/p&gt;
&lt;img alt="Nodes and slots arrays for the has shown above" class="align-center" src="https://eli.thegreenplace.net/images/2025/nodes-slots.png" /&gt;
&lt;p&gt;The positions of the nodes on the circle are stored in &lt;tt class="docutils literal"&gt;slots&lt;/tt&gt;, which is
sorted. &lt;tt class="docutils literal"&gt;nodes&lt;/tt&gt; holds the corresponding node names. For each &lt;tt class="docutils literal"&gt;i&lt;/tt&gt;,
&lt;tt class="docutils literal"&gt;nodes[i]&lt;/tt&gt; is at position &lt;tt class="docutils literal"&gt;slots[i]&lt;/tt&gt; on the circle.&lt;/p&gt;
&lt;p&gt;Here's the &lt;tt class="docutils literal"&gt;ConsistentHasher&lt;/tt&gt; data structure in Go:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ConsistentHasher&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// nodes is a list of nodes in the hash ring; it&amp;#39;s sorted in the same order&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// as slots: for each i, the node at index slots[i] is nodes[i].&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;nodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// slots is a sorted slice of node indices.&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;slots&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;uint64&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;ringSize&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint64&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="c1"&gt;// NewConsistentHasher creates a new consistent hasher with a given maximal&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;// ring size.&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;NewConsistentHasher&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ringSize&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;ConsistentHasher&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;ConsistentHasher&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;ringSize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ringSize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And this is how finding which node a given item maps to is implemented:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;// FindNodeFor finds the node an item hashes to. It&amp;#39;s an error to call this&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;// method if the hasher doesn&amp;#39;t have any nodes.&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;ConsistentHasher&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;FindNodeFor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;nodes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nb"&gt;panic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;FindNodeFor called when ConsistentHasher has no nodes&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;ih&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;hashItem&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ringSize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// Since ch.slots is a sorted list of all the node indices for our nodes, a&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// binary search is what we need here. ih is mapped to the node that has the&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// same or the next larger node index. slices.BinarySearch does exactly this,&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// by returning the index where the value would be inserted.&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;slotIndex&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;slices&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;BinarySearch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;slots&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ih&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// When the returned index is len(slots), it means the search wrapped&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// around.&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;slotIndex&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;slots&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;slotIndex&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;nodes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;slotIndex&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The key here is the binary search invocation. Adding and removing nodes is done
similarly using binary search - see &lt;a class="reference external" href="https://github.com/eliben/code-for-blog/tree/main/2025/consistent-hashing"&gt;the full code&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="better-item-distribution-with-virtual-nodes"&gt;
&lt;h2&gt;Better item distribution with virtual nodes&lt;/h2&gt;
&lt;p&gt;A common issue that comes up in the implementation of consistent hashing is
unbalanced distribution of items across the different nodes. With &lt;object class="valign-0" data="https://eli.thegreenplace.net/images/math/c63ae6dd4fc9f9dda66970e827d13f7c73fe841c.svg" style="height: 12px;" type="image/svg+xml"&gt;M&lt;/object&gt;
items and a total of &lt;object class="valign-0" data="https://eli.thegreenplace.net/images/math/b51a60734da64be0e618bacbea2865a8a7dcd669.svg" style="height: 12px;" type="image/svg+xml"&gt;N&lt;/object&gt; nodes, the &lt;em&gt;average&lt;/em&gt; distribution will be
about &lt;object class="valign-m6" data="https://eli.thegreenplace.net/images/math/ab937f504862098986b70c147198d7de0636b19d.svg" style="height: 22px;" type="image/svg+xml"&gt;\frac{M}{N}&lt;/object&gt; per node, but in practice it won't be very balanced -
some nodes will have many more items assigned to them than others
(see the Appendix for more details).&lt;/p&gt;
&lt;p&gt;In a real application, this may mean that some cache servers will be much busier
than others, which is a bad thing as far as capacity planning and efficient use
of HW. Luckily, there's an elegant tweak to the consistent hashing algorithm
that significantly mitigates the problem: virtual nodes.&lt;/p&gt;
&lt;p&gt;Instead of mapping each node to a single location on the circle, we'll map
it to &lt;em&gt;V&lt;/em&gt; locations instead. There are several ways to do this - the
simplest is just to tweak the node name in some way. For example, when
&lt;tt class="docutils literal"&gt;AddNode&lt;/tt&gt; is called to add &lt;tt class="docutils literal"&gt;node&lt;/tt&gt;, it will run:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;range&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;V&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;vnodeName&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;%v@%v&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// ... now add vnodeName to the nodes/slots slices&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then, when looking up an item we'll run into one of the virtual nodes, decode
the node's name from it (in our example just strip the &lt;tt class="docutils literal"&gt;&amp;#64;&amp;lt;number&amp;gt;&lt;/tt&gt; suffix)
and return that. Implementing node removal is similarly simple.&lt;/p&gt;
&lt;p&gt;The idea is that given a node named &lt;tt class="docutils literal"&gt;foo&lt;/tt&gt;, the virtual node names
&lt;tt class="docutils literal"&gt;foo&amp;#64;0&lt;/tt&gt;, &lt;tt class="docutils literal"&gt;foo&amp;#64;1&lt;/tt&gt;, &lt;tt class="docutils literal"&gt;foo&amp;#64;2&lt;/tt&gt; etc. will be spread all around the circle
and not cluster in a single place. See the Appendix for a calculation of how
this affects the final distribution.&lt;/p&gt;
&lt;p&gt;The source code for this post includes a &lt;tt class="docutils literal"&gt;ConsistentHasherV&lt;/tt&gt; type that
is very similar to &lt;tt class="docutils literal"&gt;ConsistentHasher&lt;/tt&gt;, except that it implements the virtual
node strategy. The user interface remains exactly the same - it's only the
internal implementation that changes slightly.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="code"&gt;
&lt;h2&gt;Code&lt;/h2&gt;
&lt;p&gt;The full source code for this post is &lt;a class="reference external" href="https://github.com/eliben/code-for-blog/tree/main/2025/consistent-hashing"&gt;on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="appendix"&gt;
&lt;h2&gt;Appendix&lt;/h2&gt;
&lt;p&gt;The quality of the hash function is very important for good shuffling of nodes
on the circle, but even if we take a perfect hash function that produces
uniformly distributed values, the outcome is likely to be sub-optimal for our
needs.&lt;/p&gt;
&lt;p&gt;Let's say we select &lt;object class="valign-0" data="https://eli.thegreenplace.net/images/math/b51a60734da64be0e618bacbea2865a8a7dcd669.svg" style="height: 12px;" type="image/svg+xml"&gt;N&lt;/object&gt; points on the unit circle, uniformly in the range
&lt;tt class="docutils literal"&gt;[0, 1)&lt;/tt&gt;. If we sort the points by angle, the gaps between neighboring angles
are the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Order_statistic"&gt;order statistics&lt;/a&gt;.
These follow the &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Beta_distribution"&gt;Beta distribution&lt;/a&gt;
with parameters &lt;tt class="docutils literal"&gt;(1, &lt;span class="pre"&gt;N-1)&lt;/span&gt;&lt;/tt&gt;, which has a mean of &lt;object class="valign-m6" data="https://eli.thegreenplace.net/images/math/9b449288a63a4dfad75b920fdd5eefa2478ada7a.svg" style="height: 22px;" type="image/svg+xml"&gt;\frac{1}{N}&lt;/object&gt; and a
variance of &lt;object class="valign-m11" data="https://eli.thegreenplace.net/images/math/d8e11ee692ae7269b8c666e5dd6597129db2dbe2.svg" style="height: 27px;" type="image/svg+xml"&gt;\frac{N-1}{N^2(N+1)}&lt;/object&gt;.&lt;/p&gt;
&lt;p&gt;This is quite significant. Consider a circle with 20 nodes. The standard
deviation of the distribution is the square root of the variance; substituting
&lt;object class="valign-0" data="https://eli.thegreenplace.net/images/math/5d6b87cee7577e70335f849b561ca9fa2d7c58ed.svg" style="height: 12px;" type="image/svg+xml"&gt;N=20&lt;/object&gt;, we get:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/dedd9be6ea6fe84cbd36e553e54f707913caf981.svg" style="height: 43px;" type="image/svg+xml"&gt;\[\sigma=\sqrt{\frac{19}{20^2\cdot 21}}=0.048\]&lt;/object&gt;
&lt;p&gt;With 20 nodes uniformly distributed around a circle, we can expect an average of
18 degrees distance between two nodes. A standard deviation of 0.048 means 17
degrees, which is comparable to the average!&lt;/p&gt;
&lt;p&gt;We can also do a realistic example to demonstrate this. Let's generate 20 random
angles on a circle, and show how the node distribution looks:&lt;/p&gt;
&lt;img alt="Circle with randomly distributed points" class="align-center" src="https://eli.thegreenplace.net/images/2025/circle-random-distribution.png" /&gt;
&lt;p&gt;In this particular sample, the average angle between two adjacent nodes is
18 degrees (as expected). The smallest angle is just 1.04 degrees, while the
largest one is 42 degrees. This means that some nodes will get 40x as many
items assigned to them as others!&lt;/p&gt;
&lt;p&gt;It's easy to see how virtual nodes help; imagine that each server maps to
some number of randomly distributed nodes on the circle; some of these will
be farther than others from their closest neighbor, but the average will have
much less variety. Mathematically, given a set of &lt;em&gt;n&lt;/em&gt; uniformly distributed
random variables with variance &lt;em&gt;v&lt;/em&gt;, the variance of their average is
&lt;object class="valign-m6" data="https://eli.thegreenplace.net/images/math/8bc0f62c31ffd5dcad4e282dea1285d2fcf82262.svg" style="height: 19px;" type="image/svg+xml"&gt;\frac{v}{n}&lt;/object&gt;.&lt;/p&gt;
&lt;p&gt;As a concrete experiment, I ran a similar simulation to the one above, but
with 10 virtual nodes per node. We'll consider the total portion of the circle
mapped to a node when it maps to either of its virtual nodes. While the
average remains 18 degrees, the variance is reduced drastically - the smallest
one is 11 degrees and the largest 26.&lt;/p&gt;
&lt;p&gt;You can find the code for
these experiments in the &lt;tt class="docutils literal"&gt;demo.go&lt;/tt&gt; file of the source code repository.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;&lt;p class="first"&gt;This is close to the original motivation for the development of
consistent caching by researchers at MIT. Their work, described in
the paper
&lt;em&gt;&amp;quot;Consistent Hashing and Random Trees:
Distributed Caching Protocols for Relieving Hot Spots on the World
Wide Web&amp;quot;&lt;/em&gt; became the foundation of &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Akamai_Technologies"&gt;Akamai Technologies&lt;/a&gt;.&lt;/p&gt;
&lt;p class="last"&gt;There are other use cases of consistent hashing in distributed systems.
AWS popularized one common application in their paper
&lt;em&gt;&amp;quot;Dynamo: Amazon’s Highly Available Key-value Store&amp;quot;&lt;/em&gt;, where the
algorithm is used to distribute storage keys across servers.&lt;/p&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-2" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-2"&gt;[2]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;In a real application, we'd probably want to find a different hash
function. &lt;tt class="docutils literal"&gt;md5&lt;/tt&gt; is relatively slow, and we don't need any of its
cryptographic guarantees in this use case. That said, the chosen hash
function should be suitable and produce a good mixing of bits even for
items that are only slightly different (e.g. &amp;quot;node-1&amp;quot; vs. &amp;quot;node-2&amp;quot;);
I found that Go's built-in &lt;tt class="docutils literal"&gt;hash/fnv&lt;/tt&gt; package isn't great for this
purpose, for example.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-3" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-3"&gt;[3]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;A circle is used instead of a linear range to conveniently handle
boundary conditions. It's similar to using modular arithmetic for the
naive hashing approach.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-4" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-5"&gt;[4]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;I didn't bother to compare performance, but I suspect the array-based
approach will be faster for lookup because of being
relatively cache-friendly. For insertion/removal, the array approach
is &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/ebc75cd71fe8ecc45d16e8fbe4ca608d05d1efe0.svg" style="height: 19px;" type="image/svg+xml"&gt;O(n)&lt;/object&gt; whereas the tree is &lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/d7fff6fa55b0a8556bd4b8f0b67820a60ce92451.svg" style="height: 19px;" type="image/svg+xml"&gt;O(\log\ n)&lt;/object&gt;, but it's safe to
assume that lookup performance is significantly more important. Nodes
don't change very often, and lookups happen orders of magnitude more
frequently.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
</content><category term="misc"></category><category term="Go"></category><category term="Programming"></category></entry><entry><title>Implementing Forth in Go and C</title><link href="https://eli.thegreenplace.net/2025/implementing-forth-in-go-and-c/" rel="alternate"></link><published>2025-08-26T20:38:00-07:00</published><updated>2025-08-27T03:39:03-07:00</updated><author><name>Eli Bendersky</name></author><id>tag:eli.thegreenplace.net,2025-08-26:/2025/implementing-forth-in-go-and-c/</id><summary type="html">&lt;p&gt;I first ran into Forth about 20 years ago when reading a book about
&lt;a class="reference external" href="https://www.oreilly.com/library/view/designing-embedded-hardware/0596007558/"&gt;designing embedded hardware&lt;/a&gt;.
The reason I got the book back then was to actually learn more about the HW
aspects, so having skimmed the Forth chapter I just registered an &amp;quot;oh, this is neat&amp;quot;
mental note …&lt;/p&gt;</summary><content type="html">&lt;p&gt;I first ran into Forth about 20 years ago when reading a book about
&lt;a class="reference external" href="https://www.oreilly.com/library/view/designing-embedded-hardware/0596007558/"&gt;designing embedded hardware&lt;/a&gt;.
The reason I got the book back then was to actually learn more about the HW
aspects, so having skimmed the Forth chapter I just registered an &amp;quot;oh, this is neat&amp;quot;
mental note and moved on with my life. Over the last two decades I
heard about Forth a few more times here and there, such as that time when
&lt;a class="reference external" href="https://factorcode.org/"&gt;Factor&lt;/a&gt; was talked about for a brief period, maybe
10-12 years ago or so.&lt;/p&gt;
&lt;p&gt;It always occupied a slot in the &amp;quot;weird language&amp;quot; category inside my brain, and
I never paid it much attention. Until June this year, when a couple of factors
combined fortuitously:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;After spending much of the &lt;a class="reference external" href="https://eli.thegreenplace.net/archives/2025"&gt;earlier part of 2025&lt;/a&gt;
exploring the inner workings
of LLMs and digging in random mathy and algorithmic topics, I had an itch
to just write some code.&lt;/li&gt;
&lt;li&gt;I somehow found &lt;a class="reference external" href="https://ratfactor.com/forth/the_programming_language_that_writes_itself.html"&gt;Dave Gauer's page about Forth&lt;/a&gt;
and also the one on &lt;a class="reference external" href="https://ratfactor.com/forth/implementing"&gt;Implementing a Forth&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And something clicked. I'm going to implement a Forth, because... why not?&lt;/p&gt;
&lt;p&gt;So I spent much of my free hacking time over the past two months learning
about Forth and implementing &lt;em&gt;two&lt;/em&gt; of them.&lt;/p&gt;
&lt;div class="section" id="forth-the-user-level-and-the-hacker-level"&gt;
&lt;h2&gt;Forth: the user level and the hacker level&lt;/h2&gt;
&lt;p&gt;It's useful to think of Forth (at least &lt;a class="reference external" href="https://forth-standard.org/"&gt;standard Forth&lt;/a&gt;,
not offshoots like Factor) as having two different &amp;quot;levels&amp;quot;:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;&lt;strong&gt;User&lt;/strong&gt; level: you just want to use the language to write programs. Maybe
you're indeed bringing up new hardware, and find Forth a useful
calculator + REPL + script language. You don't care about Forth's
implementation or its soul, you just want to complete your task.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Hacker&lt;/strong&gt; level: you're interested in the deeper soul of Forth. Isn't it
amazing that even control flow constructs like &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;IF...THEN&lt;/span&gt;&lt;/tt&gt; or loops like
&lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;BEGIN...UNTIL&lt;/span&gt;&lt;/tt&gt; are just Forth words, and if you wanted, you could implement
your own control flow constructs and have them be first-class citizens, as
seamless and efficient as the standard ones?&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Another way to look at it (useful if you belong to a certain crowd) is that
user-level Forth is like Lisp without macros, and hacker-level Forth has macros
enabled. Lisp can still be great and useful without macros, but macros take
it to an entire new level and also unlock the deeper soul of the language.&lt;/p&gt;
&lt;p&gt;This distinction will be important when discussing my Forth implementations
below.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="goforth-and-ctil"&gt;
&lt;h2&gt;goforth and ctil&lt;/h2&gt;
&lt;img alt="Logo of goforth" class="align-center" src="https://eli.thegreenplace.net/images/pages/goforth-logo-sm.png" /&gt;
&lt;p&gt;There's a certain way Forth is supposed to be implemented; this is how it was
originally designed, and if you get closer to the hacker level, it
becomes apparent that you're pretty much required to implement it this way -
otherwise supporting all of the language's standard words will be very
difficult. I'm talking about the classical approach of a linked dictionary,
where a word is represented as a &amp;quot;threaded&amp;quot; list &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt;, and this dictionary is
available for user code to augment and modify. Thus, much of the Forth
implementation can be written in Forth itself.&lt;/p&gt;
&lt;p&gt;The first implementation I tried is stubbornly different. Can we just make a
pure interpreter? This is what &lt;a class="reference external" href="https://github.com/eliben/goforth"&gt;goforth&lt;/a&gt;
is trying to explore (the Go implementation located in the root directory of
that repository). Many built-in words are supported - definitely enough to
write useful programs - and compilation
(the definition of new Forth words using &lt;tt class="docutils literal"&gt;: word ... ;&lt;/tt&gt;) is implemented by
storing the actual string following the word name in the dictionary, so it can
be interpreted when the word is invoked.&lt;/p&gt;
&lt;p&gt;This was an interesting approach and in some sense, it &amp;quot;works&amp;quot;. For the user
level of Forth, this is perfectly usable (albeit slow). However, it's
insufficient for the hacker level, because the host language interpreter (the
one in Go) has all the control, so it's impossible to implement &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;IF...THEN&lt;/span&gt;&lt;/tt&gt; in
Forth, for example (it has to be implemented in the host language).&lt;/p&gt;
&lt;p&gt;That was a fun way to get a deeper sense of what Forth is about, but I did want
to implement the hacker level as well, so the second implementation -
&lt;a class="reference external" href="https://github.com/eliben/goforth/tree/main/ctil"&gt;ctil&lt;/a&gt; - does just that.
It's inspired by the &lt;a class="reference external" href="http://git.annexia.org/?p=jonesforth.git"&gt;jonesforth&lt;/a&gt;
assembly implementation, but done in C instead &lt;a class="footnote-reference" href="#footnote-2" id="footnote-reference-2"&gt;[2]&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;ctil actually lets us implement major parts of Forth in Forth itself. For
example, &lt;tt class="docutils literal"&gt;variable&lt;/tt&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;variable&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;create&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;cells&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;allot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Conditionals:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;\ IF, ELSE, THEN work together to compile to lower-level branches.&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;\&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;\ IF ... THEN compiles to:&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;\   0BRANCH OFFSET true-part rest&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;\ where OFFSET is the offset of rest&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;\&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;\ IF ... ELSE ... THEN compiles to :&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;\   0BRANCH OFFSET true-part BRANCH OFFSET2 false-part rest&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;\ where OFFSET is the offset of false-part and OFFSET2 is the offset of rest&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="kn"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;immediate&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nf"&gt;&amp;#39;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="nf"&gt;branch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;,&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;here&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;,&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="kn"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;then&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;immediate&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;dup&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;here&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;swap&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;swap&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;!&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="kn"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;else&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;immediate&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nf"&gt;&amp;#39;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;branch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;,&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;here&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;,&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;swap&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;dup&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;here&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;swap&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;swap&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;!&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;These are actual examples of ctil's &amp;quot;prelude&amp;quot; - a Forth file loaded before any
user code. If you understand Forth, this code is actually rather mind-blowing.
We compile &lt;tt class="docutils literal"&gt;IF&lt;/tt&gt; and the other words by directly laying our their low-level
representation in memory, and different words communicate with each other
using the data stack &lt;em&gt;during compilation&lt;/em&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="thoughts-on-forth-itself"&gt;
&lt;h2&gt;Thoughts on Forth itself&lt;/h2&gt;
&lt;p&gt;Forth made perfect sense in the historic context in which it was created in
the early 1970s. Imagine having some HW connected to your computer (a telescope
in the case of Forth's creator), and you have to interact with it. In terms
of languages at your disposal - you don't have much, even BASIC wasn't invented
yet. Perhaps your machine still didn't have a C compiler ported to it; C
compilers aren't simple, and C isn't very great for exploratory scripting
anyway. So you mostly just have your assembly language and whatever you build
on top.&lt;/p&gt;
&lt;p&gt;Forth is easy to implement in assembly and it gives you a much higher-level
language; you can use it as a calculator, as a REPL, and as a DSL for pretty
much anything due to its composable nature.&lt;/p&gt;
&lt;p&gt;Forth certainly has interesting aspects; it's a &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Concatenative_programming_language"&gt;concatenative language&lt;/a&gt;,
and thus inherently &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Tacit_programming"&gt;point-free&lt;/a&gt;.
A classical example is that instead of writing the following in a more
traditional syntax:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;eat(bake(prove(mix(ingredients))))
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You just write this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;ingredients mix prove bake eat
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;There is no need to explicitly pass parameters, or to explicitly return results.
Everything happens implicitly on the stack.&lt;/p&gt;
&lt;p&gt;This is useful for REPL-style programming where you use your language not
necessarily for writing large programs, but more for interactive instructions to
various HW devices. This dearth of syntax is also what makes Forth simple
to implement.&lt;/p&gt;
&lt;p&gt;All that said, in my mind Forth is firmly in the &amp;quot;weird language&amp;quot; category;
it's instructive to learn and to implement, but I wouldn't actually use it
for anything real these days. The stack-based programming model is cool for
very terse point-free programs, but it's not particularly readable and hard
to reason about without extensive comments, in my experience.&lt;/p&gt;
&lt;p&gt;Consider the implementation of a pretty standard Forth word: &lt;tt class="docutils literal"&gt;+!&lt;/tt&gt;. It expects
and address at the top of stack, and an addend below it. It adds the addend to
the value stored at that address. Here's a Forth implementation from
ctil's prelude:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;+!&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;( addend addr -- )&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;tuck&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;( addr addend addr )&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;@&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="c1"&gt;( addr addend value-at-addr )&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;+&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="c1"&gt;( addr updated-value )&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;swap&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;( updated-value addr )&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;!&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Look at that stack wrangling! It's really hard to follow what goes where without
the detailed comments showing the stack layout on the right of each instruction
(a common practice for Forth programs). Sure, we can create additional words
that would make this simpler, but that just increases the lexicon of words to
know.&lt;/p&gt;
&lt;p&gt;My point is, there's fundamental difficulty here. When you see this C code:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;foo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Even without any documentation, you can immediately know several important
things:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;tt class="docutils literal"&gt;bar&lt;/tt&gt; has one parameter and one return value&lt;/li&gt;
&lt;li&gt;&lt;tt class="docutils literal"&gt;foo&lt;/tt&gt; has two parameters and one return value&lt;/li&gt;
&lt;li&gt;&lt;tt class="docutils literal"&gt;func&lt;/tt&gt; also has two parameters and one return value&lt;/li&gt;
&lt;li&gt;It's immediately obvious how the various values flow from one function call
to the next.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Written in Forth &lt;a class="footnote-reference" href="#footnote-3" id="footnote-reference-3"&gt;[3]&lt;/a&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kn"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;bar&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;foo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;How can you know the arity of the functions without adding explicit comments?
Sure, if you have a handful of words like &lt;tt class="docutils literal"&gt;bar&lt;/tt&gt; and &lt;tt class="docutils literal"&gt;foo&lt;/tt&gt; you know like the
back of your hand, this is easy. But imagine reading a large, unfamiliar code
base full of code like this and trying to comprehend it.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="summary-and-links"&gt;
&lt;h2&gt;Summary and links&lt;/h2&gt;
&lt;p&gt;The source code of my &lt;a class="reference external" href="https://github.com/eliben/goforth"&gt;goforth project is on GitHub&lt;/a&gt;; both
implementations are there, with a comprehensive test harness that tests both.&lt;/p&gt;
&lt;p&gt;The learn Forth itself, I found these resources very useful:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="https://ratfactor.com/forth/the_programming_language_that_writes_itself.html"&gt;Dave Gauer's Forth page&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://www.forth.com/starting-forth/"&gt;Starting Forth&lt;/a&gt; - a free online
book / tutorial on Forth for beginners&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To learn how to implement Forth:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="https://ratfactor.com/forth/implementing"&gt;Dave Gauer's page on Implementing a Forth&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="http://git.annexia.org/?p=jonesforth.git"&gt;jonesforth&lt;/a&gt; implementation&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://archive.org/details/R.G.LoeligerThreadedInterpretiveLanguagesTheirDesignAndImplementationByteBooks1981"&gt;Threaded Interpretive Languages&lt;/a&gt; - an
old but nice book that explains how Forth implementations typically work&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Implementing Forth is a great self-improvement project for a coder; there's a
pleasantly challenging hump of understanding to overcome, and you gain valuable
insights into stack machines, interpretation vs. compilation and mixing these
levels of abstraction in cool ways.&lt;/p&gt;
&lt;p&gt;Also, implementing programming languages
from scratch is fun! It's hard to beat the feeling of getting to interact with
your implementation for the first time, and then iterating on improving it
and making it more featureful. &lt;a class="reference external" href="https://www.urbandictionary.com/define.php?term=One+More+Turn+Syndrome"&gt;Just one more word&lt;/a&gt;!&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;This has nothing to do with threads in the sense of concurrency.
Rather, it's thread like in sewing, where the elements of the list
are all connected to each other as if with a thread. See
&lt;a class="reference external" href="https://wiki.c2.com/?ThreadedInterpretiveLanguage"&gt;this page&lt;/a&gt; for
more details.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-2" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-2"&gt;[2]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;&lt;p class="first"&gt;Which is another deviation from the norm. Forth is really supposed to
be implemented in assembly - this is what it was designed for, and it's
very clear from its structure that it must be so in order to achieve
peak performance.&lt;/p&gt;
&lt;p&gt;But where's the fun in doing things the way they were supposed to be
done? Besides, jonesforth is already a perfectly fine Forth implementation
in assembly, so I wouldn't have learned much by just copying it.&lt;/p&gt;
&lt;p class="last"&gt;I had a lot of fun coding in C for this one; it's been a while since I
last wrote non-trivial amounts of C, and I found it very enjoyable.&lt;/p&gt;
&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-3" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-3"&gt;[3]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;Assuming the convention that multi-parameter functions have their
parameters pushed to the stack from left to right.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
</content><category term="misc"></category><category term="C &amp; C++"></category><category term="Compilation"></category><category term="Go"></category></entry><entry><title>Bloom filters</title><link href="https://eli.thegreenplace.net/2025/bloom-filters/" rel="alternate"></link><published>2025-05-01T18:35:00-07:00</published><updated>2025-07-01T01:40:24-07:00</updated><author><name>Eli Bendersky</name></author><id>tag:eli.thegreenplace.net,2025-05-01:/2025/bloom-filters/</id><summary type="html">&lt;p&gt;The original motivation for the creation of Bloom filters is efficient set
membership, using a probabilistic approach to significantly reduce the time and
space required to reject items that are not members in a certain set.&lt;/p&gt;
&lt;p&gt;The data structure was proposed by Burton Bloom in &lt;a class="reference external" href="https://dl.acm.org/doi/pdf/10.1145/362686.362692"&gt;a 1970 paper&lt;/a&gt; titled &amp;quot;Space …&lt;/p&gt;</summary><content type="html">&lt;p&gt;The original motivation for the creation of Bloom filters is efficient set
membership, using a probabilistic approach to significantly reduce the time and
space required to reject items that are not members in a certain set.&lt;/p&gt;
&lt;p&gt;The data structure was proposed by Burton Bloom in &lt;a class="reference external" href="https://dl.acm.org/doi/pdf/10.1145/362686.362692"&gt;a 1970 paper&lt;/a&gt; titled &amp;quot;Space/Time
Trade-offs in Hash Coding with Allowable Errors&amp;quot;. It's a good paper that's
worth reading.&lt;/p&gt;
&lt;div class="section" id="why-bloom-filters"&gt;
&lt;h2&gt;Why Bloom filters?&lt;/h2&gt;
&lt;p&gt;Suppose that we store some information on disk and want to check if a certain
file contains a certain entry. Reading from disk is time consuming, so we want
to minimize it as much as possible. A Bloom filter is a data structure that
implements a cache with probabilistic properties:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;If the cache says the key is not present in a specific file, then it's
100% certain we should not be reading the file.&lt;/li&gt;
&lt;li&gt;If the cache says the key &lt;em&gt;is&lt;/em&gt; present in the file, there's a small chance
this is a false positive (and in fact the key isn't there). In this case
we just read the file as usual.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In a scenario where the majority of queries &amp;quot;is this key in that file?&amp;quot; have a
negative answer, a Bloom filter can significantly speed up the system &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt;.
Moreover, the probabilistic nature (the existence of false positives) allows
Bloom filters to be extremely fast and occupy very little space. Here's a quote
from the Bloom paper:&lt;/p&gt;
&lt;blockquote&gt;
The new hash-coding methods to be introduced are
suggested for applications in which the great majority of
messages to be tested will not belong to the given set. For
these applications, it is appropriate to consider as a unit of
time (called &lt;em&gt;reject time&lt;/em&gt;) the average time required to
classify a test message as a nonmember of the given set.&lt;/blockquote&gt;
&lt;/div&gt;
&lt;div class="section" id="how-a-bloom-filter-works"&gt;
&lt;h2&gt;How a Bloom filter works&lt;/h2&gt;
&lt;p&gt;A Bloom filter is a special kind of a &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Open_addressing"&gt;hash table with open addressing&lt;/a&gt;.
It's an array of bits (the length is typically denoted &lt;em&gt;m&lt;/em&gt;), and some fixed
number (&lt;em&gt;k&lt;/em&gt;) of hash functions. We'll assume each hash function can take an
arbitrary sequence of bytes and hash it into an integer in the inclusive range
&lt;tt class="docutils literal"&gt;[0, &lt;span class="pre"&gt;m-1]&lt;/span&gt;&lt;/tt&gt;. A Bloom filter supports two operations:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Insert an item&lt;/strong&gt;: the item is hashed using each of the &lt;em&gt;k&lt;/em&gt; hash functions, and the
appropriate bits in the underlying array are set to 1.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Test if an item is a member&lt;/strong&gt;: the item is hashed using each of the &lt;em&gt;k&lt;/em&gt; hash
functions. If any of the bits indicated by their results is 0, we return &amp;quot;false&amp;quot;
with certainty. If all the bits are 1, we return &amp;quot;true&amp;quot; - and there's a small
chance of false positives.&lt;/p&gt;
&lt;p&gt;Here's how the Bloom paper describes it:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The hash area is considered as N individual addressable bits, with addresses 0
through N - 1. It is assumed that all bits in the hash area are first set to 0.&lt;/p&gt;
&lt;p&gt;Next, each message in the set to be stored is hash coded into a number of
distinct bit addresses, say a1, a2, . . . , ad. Finally, all d bits addressed by
a1 through ad are set to 1.&lt;/p&gt;
&lt;p&gt;To test a new message a sequence of d bit addresses,
say a'1, a'2, ... a'd, is generated in the same manner as for storing a message.
If all d bits are 1, the new message is accepted. If any of these bits is zero,
the message is rejected.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Hopefully it's clear why this data structure is probabilistic in nature: it's
possible that different items hash to the same number, and therefore when
we test some X, all its hashes point to bits turned on by the hashing of other
data. Read the Math appendix for the math behind Bloom filters and how to
calculate (and design for a specific) the false positive rate.&lt;/p&gt;
&lt;p&gt;Here's an example:&lt;/p&gt;
&lt;img alt="Insertion into Bloom filter" class="align-center" src="https://eli.thegreenplace.net/images/2025/bloom-filter-insert.png" /&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;We start with an empty bloom filter with &lt;tt class="docutils literal"&gt;m=16&lt;/tt&gt; and &lt;tt class="docutils literal"&gt;k=3&lt;/tt&gt;. All bits
are initialized to 0.&lt;/li&gt;
&lt;li&gt;Insertion of &amp;quot;x&amp;quot;. The three hashes return indices 1, 6, 15, so these bits
in the array are set to 1.&lt;/li&gt;
&lt;li&gt;Insertion of &amp;quot;y&amp;quot;. Hashing returns indices 6, 9 and 13, so these
bits in the array are set to 1. Note that bit 6 is set for both &amp;quot;x&amp;quot; and &amp;quot;y&amp;quot;,
and that's fine.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Next, let's look at some membership tests:&lt;/p&gt;
&lt;img alt="Membership test in a Bloom filter" class="align-center" src="https://eli.thegreenplace.net/images/2025/bloom-filter-test.png" /&gt;
&lt;ol class="arabic simple" start="4"&gt;
&lt;li&gt;Test &amp;quot;x&amp;quot;. Hashing returns 1, 6, 15; all these bits are 1 in the
array, so the answer is &amp;quot;true&amp;quot;. This is a true positive.&lt;/li&gt;
&lt;li&gt;Test &amp;quot;w&amp;quot;. Hashing returns 3, 9, 13. Since the bit at position 3 is 0, the
answer is &amp;quot;false&amp;quot;.&lt;/li&gt;
&lt;li&gt;Test &amp;quot;v&amp;quot;. Hashing returns 9, 13, 15; all these bits are 1 in the array,
so the answer is &amp;quot;true&amp;quot;. This is a false positive.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Note that it's trivial to prove (by the law of contraposition) that all &amp;quot;false&amp;quot;
answers from a Bloom filter's test operation are true negatives.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="implementation"&gt;
&lt;h2&gt;Implementation&lt;/h2&gt;
&lt;p&gt;Here's a simple implementation of a Bloom filter in Go:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;// New creates a new BloomFilter with capacity m, using k hash functions.&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;// You can calculate m and k from the number of elements you expect the&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;// filter to hold and the desired error rate using CalculateParams.&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;BloomFilter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;BloomFilter&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;bitset&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;newBitset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;seed1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;maphash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;MakeSeed&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;seed2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;maphash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;MakeSeed&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;BloomFilter&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="kt"&gt;uint64&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="kt"&gt;uint64&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;bitset&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;uint64&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// seeds for the double hashing scheme.&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;seed1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;seed2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;maphash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Seed&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="c1"&gt;// Insert a data item into the bloom filter.&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bf&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;BloomFilter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;h1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;maphash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;seed1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;h2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;maphash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;seed2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;range&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;bf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;loc&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;h1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;h2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;bf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;bitsetSet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bitset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="c1"&gt;// Test if the given data item is in the bloom filter. If Test returns false,&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;// it&amp;#39;s guaranteed that data was never added to the filter. If it returns true,&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;// there&amp;#39;s an eps probability of this being a false positive. eps depends on&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;// the parameters the filter was created with (see CalculateParams).&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bf&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;BloomFilter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;byte&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;h1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;maphash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;seed1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;h2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;maphash&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;seed2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;range&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;bf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;loc&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;h1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;h2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;bf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;bitsetTest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bitset&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;tt class="docutils literal"&gt;bitsetSet&lt;/tt&gt; and &lt;tt class="docutils literal"&gt;bitsetTest&lt;/tt&gt; functions can be seen in the
&lt;a class="reference external" href="https://github.com/eliben/code-for-blog/tree/main/2025/bloom"&gt;full code repository&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This implementation uses &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Double_hashing"&gt;double hashing&lt;/a&gt; to
generate &lt;em&gt;k&lt;/em&gt; different hash functions from just two hashes.&lt;/p&gt;
&lt;p&gt;The code also mentions the &lt;tt class="docutils literal"&gt;CalculateParams&lt;/tt&gt; function:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;// CalculateParams calculates optimal parameters for a Bloom filter that&amp;#39;s&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;// intended to contain n elements with error (false positive) rate eps.&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;CalculateParams&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;eps&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;uint64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// The formulae we derived are:&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// (m/n) = -ln(eps)/(ln(2)*ln(2))&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="c1"&gt;// k = (m/n)ln(2)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;ln2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;mdivn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;eps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ln2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ln2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;uint64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Ceil&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;float64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;mdivn&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;k&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;uint64&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Ceil&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;mdivn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ln2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You'll have to read the Math appendix to understand how it works.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="practicalities"&gt;
&lt;h2&gt;Practicalities&lt;/h2&gt;
&lt;p&gt;Let's look at a practical example of a realistic Bloom filter and how it
performs. Suppose we want to store about 1 billion items, and have a false
positive rate of 1% (meaning that if the filter returns &amp;quot;true&amp;quot; for a test,
there's a 99% chance that the item was previously added to the filter).
Using these requirements, we can invoke &lt;tt class="docutils literal"&gt;CalculateParams&lt;/tt&gt; to get the Bloom
filter parameters:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;CalculateParams(1000000000, 0.01) ===&amp;gt; (9585058378 7)
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This means &lt;em&gt;m&lt;/em&gt; is about 9.6 billion (bits) and &lt;em&gt;k&lt;/em&gt; is 7. In other words, our
Bloom filter requires about 1.2 GB of space to cache the membership test of
a billion items (that could be of arbitrary size). Moreover, the lookup is
very fast - it's just 7 applications of the hash function. The constant lookup
cost is particularly attractive, as it doesn't depend on the number of items
actually inserted, or on any particular pattern in the data (i.e. there are
no worst case scenarios with asymptotically higher cost).&lt;/p&gt;
&lt;p&gt;On my machine, benchmarking with the Go implementation shown above I get ~80
&lt;em&gt;nanoseconds&lt;/em&gt; per lookup. Mind you, this is the simplest Go implementation I
could think of - nothing here is optimized; I'm sure this can be improved at
least 2x by using a more speed-optimized hash implementation, for example.&lt;/p&gt;
&lt;p&gt;Now imagine how long it would take to ascertain if data is present in a file
with 1 billion entries, even if the file contains proper indexing for fast
lookups. Just asking the OS to read the file's first few KiBs to get at the
index would take orders of magnitude longer than 80 ns.&lt;/p&gt;
&lt;p&gt;Recall that Bloom filters are best suited for cases &amp;quot;in which the great majority
of messages to be tested will not belong to the given set&amp;quot;. Moreover, even if
the data exists in the file, false positives only happen 1% of the time.
Therefore, the number of times we'll have to go to the disk just to find the
data is not there is a very small fraction of total accesses.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="code"&gt;
&lt;h2&gt;Code&lt;/h2&gt;
&lt;p&gt;The full code for this post, with tests, is available
&lt;a class="reference external" href="https://github.com/eliben/code-for-blog/tree/main/2025/bloom"&gt;on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="appendix-the-math-behind-bloom-filters"&gt;
&lt;h2&gt;Appendix: the Math behind Bloom filters&lt;/h2&gt;
&lt;p&gt;A reminder on notation:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;em&gt;m&lt;/em&gt;: size (in bits) of the set&lt;/li&gt;
&lt;li&gt;&lt;em&gt;n&lt;/em&gt;: how many keys were inserted into the filter&lt;/li&gt;
&lt;li&gt;&lt;em&gt;k&lt;/em&gt;: number of hash functions used to insert/test each key&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For a specific bit in the set, assuming our hash functions distribute
the keys randomly, the probability of it not being set by a specific
hash function is:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/614cc8a93f58bb9de00a14eb422e050c8cecdcf3.svg" style="height: 36px;" type="image/svg+xml"&gt;\[p_0=1-\frac{1}{m}\]&lt;/object&gt;
&lt;p&gt;And the probability it’s not set by either of our &lt;em&gt;k&lt;/em&gt; hash functions is:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/9e23b0332bb5103983efcc3116881c5a74628f32.svg" style="height: 50px;" type="image/svg+xml"&gt;\[p_0=\left ( 1-\frac{1}{m}\right )^k=\left( \left ( 1-\frac{1}{m}\right )^m\right)^\frac{k}{m}\]&lt;/object&gt;
&lt;p&gt;The last formula is constructed to use an approximation of &lt;object class="valign-0" data="https://eli.thegreenplace.net/images/math/1624dce91de495347430ec2518baf6c6a5328d2e.svg" style="height: 12px;" type="image/svg+xml"&gt;e^x&lt;/object&gt;
for a large enough &lt;em&gt;m&lt;/em&gt; (see the &lt;a class="reference external" href="https://eli.thegreenplace.net/2022/derivative-of-the-exponential-function/"&gt;appendix in this post&lt;/a&gt;) to write:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/02cfac5acdc4308cb27360a538e37a4aba914b70.svg" style="height: 24px;" type="image/svg+xml"&gt;\[p_0\approx e^{-\frac{k}{m}}\]&lt;/object&gt;
&lt;p&gt;After inserting &lt;em&gt;n&lt;/em&gt; elements, the probability that it’s 0 is:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/519b6d982e021474d9b8dbf9239d7ccff3b4ae7e.svg" style="height: 24px;" type="image/svg+xml"&gt;\[p_0\approx e^{-\frac{kn}{m}}\]&lt;/object&gt;
&lt;p&gt;Meaning that the probability of it being 1 is:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/074385a3121ab09c5569a330529b0d9d527e9bd2.svg" style="height: 24px;" type="image/svg+xml"&gt;\[p_1\approx 1-e^{-\frac{kn}{m}}\]&lt;/object&gt;
&lt;p&gt;Recap: this is the probability of any given bit being 1 after &lt;em&gt;n&lt;/em&gt; bits
were inserted into a set of size &lt;em&gt;m&lt;/em&gt; with &lt;em&gt;k&lt;/em&gt; different hash functions.&lt;/p&gt;
&lt;p&gt;Assuming independence between our hash functions (this is not super
rigorous, but a reasonable assumption in practice), let’s calculate the
false positive rate. Suppose we have a new key that’s not in the set,
and we’re trying to check its membership by hashing it with our &lt;em&gt;k&lt;/em&gt; hash
functions. The false positive rate is the probability that all hashes
land on a bit that’s already set to 1:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/f4b4da65f9808c861c7baa786d658f5840afdb3f.svg" style="height: 36px;" type="image/svg+xml"&gt;\[p_{fp}\approx\left ( 1-e^{-\frac{kn}{m}} \right )^k\approx \varepsilon\]&lt;/object&gt;
&lt;p&gt;This is also called the &lt;em&gt;error rate&lt;/em&gt; of our filter, or
&lt;object class="valign-0" data="https://eli.thegreenplace.net/images/math/abfcc7fffa771b562756e65a07ae43b713c3f026.svg" style="height: 8px;" type="image/svg+xml"&gt;\varepsilon&lt;/object&gt;. To get an optimal (minimal) false positive rate,
let’s minimize &lt;object class="valign-0" data="https://eli.thegreenplace.net/images/math/abfcc7fffa771b562756e65a07ae43b713c3f026.svg" style="height: 8px;" type="image/svg+xml"&gt;\varepsilon&lt;/object&gt;. Since the logarithm function is
monotonically increasing, it will be more convenient to minimize
&lt;object class="valign-m5" data="https://eli.thegreenplace.net/images/math/65d846cbd474cd37f8d3f1e586058d772118f3a4.svg" style="height: 19px;" type="image/svg+xml"&gt;\ln(\varepsilon)&lt;/object&gt;:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/aed89bc0710984b32dedb04112b657820864909f.svg" style="height: 36px;" type="image/svg+xml"&gt;\[ln(\varepsilon)=\ln\left ( 1-e^{-\frac{kn}{m}} \right )^k=k\cdot \ln\left ( 1-e^{-\frac{kn}{m}} \right )\]&lt;/object&gt;
&lt;p&gt;We’ll calculate the derivative w.r.t. &lt;em&gt;k&lt;/em&gt; and set it to 0:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/e0b8cce2713100be2a0b59e03951e08c46f7369f.svg" style="height: 91px;" type="image/svg+xml"&gt;\[\begin{aligned}
    \frac{d}{dk}ln(\varepsilon)&amp;amp;=\frac{d}{dk}k\cdot \ln\left ( 1-e^{-\frac{kn}{m}} \right )\\
    &amp;amp;=\ln\left ( 1-e^{-\frac{kn}{m}} \right ) + k\frac{e^{-\frac{kn}{m}}\cdot \frac{n}{m}}{1-e^{-\frac{kn}{m}}}
\end{aligned}\]&lt;/object&gt;
&lt;p&gt;Substituting a variable &lt;object class="valign-0" data="https://eli.thegreenplace.net/images/math/523c4d2567fa8a6b1cbfe9c1a4655841cfeab816.svg" style="height: 19px;" type="image/svg+xml"&gt;t=e^{-\frac{kn}{m}}&lt;/object&gt; and using some more
calculus and algebra, we can find that:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/862f5ea8585950a1bcb6d7edf7a84a6838cd8b57.svg" style="height: 32px;" type="image/svg+xml"&gt;\[k= \frac{m}{n}\cdot \ln(2)\]&lt;/object&gt;
&lt;p&gt;A numerical example: if we have a set with 1 million bits, and we expect
to insert about 100,000 elements, the optimal number of hash functions
is:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/451fd85f0b000022e9759670738e26642cb29308.svg" style="height: 19px;" type="image/svg+xml"&gt;\[k= 10\cdot \ln(2)= 6.93 \approx 7\]&lt;/object&gt;
&lt;p&gt;However, it’s more useful to aim for a certain error rate, and set the
filter parameters accordingly. Let’s assume we’ll be using this optimal
value of &lt;em&gt;k&lt;/em&gt;. Substituting &lt;object class="valign-m6" data="https://eli.thegreenplace.net/images/math/471ee855c43cda2877ae952615934784f468fc6d.svg" style="height: 20px;" type="image/svg+xml"&gt;k= \frac{m}{n}\cdot \ln(2)&lt;/object&gt; into the
equation for &lt;object class="valign-0" data="https://eli.thegreenplace.net/images/math/abfcc7fffa771b562756e65a07ae43b713c3f026.svg" style="height: 8px;" type="image/svg+xml"&gt;\varepsilon&lt;/object&gt; from above:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/81cb37a17ef1d823f8bf69f8d7971f5cfdcbe092.svg" style="height: 133px;" type="image/svg+xml"&gt;\[\begin{aligned}
    \varepsilon&amp;amp;\approx \left ( 1-e^{-\frac{n}{m}\cdot{\frac{m}{n}\cdot \ln 2}} \right )^{\frac{m}{n}\cdot \ln 2}\\
    &amp;amp;\approx \left ( 1-e^{-\ln 2} \right )^{\frac{m}{n}\cdot \ln 2}\\
    &amp;amp;\approx \left ( \frac{1}{2} \right )^{\frac{m}{n}\cdot \ln 2}\\
\end{aligned}\]&lt;/object&gt;
&lt;p&gt;If we use the numerical example from before with &lt;object class="valign-m6" data="https://eli.thegreenplace.net/images/math/37ca8b393a55677095eb0f3db3e13ed44288e19d.svg" style="height: 19px;" type="image/svg+xml"&gt;\frac{m}{n}=10&lt;/object&gt;,
the error rate with an optimal &lt;em&gt;k&lt;/em&gt; will be
&lt;object class="valign-m1" data="https://eli.thegreenplace.net/images/math/3fd04192acc419ccfb470abf0e9a0c34def7e731.svg" style="height: 16px;" type="image/svg+xml"&gt;0.5^{6.93}\approx 0.8\%&lt;/object&gt;.&lt;/p&gt;
&lt;p&gt;What often happens is that we have an error rate in mind and we want to
calculate how many bits per element we want to dedicate in our set.
Let’s take the previous equation and try to isolate &lt;object class="valign-m6" data="https://eli.thegreenplace.net/images/math/86609e60f441c339d6763e565a1f2bbf762d109d.svg" style="height: 19px;" type="image/svg+xml"&gt;\frac{m}{n}&lt;/object&gt;
from it using a logarithm:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/f6335b6f24e7a3810a317f92077033764ebea897.svg" style="height: 32px;" type="image/svg+xml"&gt;\[\ln \varepsilon\approx \frac{m}{n}\cdot \ln 2 \cdot \ln 2^{-1}=-\frac{m}{n}\ln^2(2)\]&lt;/object&gt;
&lt;p&gt;Then:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/b7a380c80172b7f16be9e4d8db3d72d9b110228e.svg" style="height: 42px;" type="image/svg+xml"&gt;\[\frac{m}{n}\approx -\frac{\ln \varepsilon}{\ln^2(2)}\]&lt;/object&gt;
&lt;p&gt;Final numerical example: suppose we want an error (false positive) rate
of 1%. This means our set should have:&lt;/p&gt;
&lt;object class="align-center" data="https://eli.thegreenplace.net/images/math/dd154db77f7ce00754caaedf5a47180eee7e67da.svg" style="height: 42px;" type="image/svg+xml"&gt;\[\frac{m}{n}\approx -\frac{\ln 0.01}{\ln^2 (2)}=9.58\]&lt;/object&gt;
&lt;p&gt;... bits per element. So if we expect about 100,000 elements, the bit
set used for our filter should have at least 958,000 bits. And, as
calculated earlier, we should be using &lt;object class="valign-0" data="https://eli.thegreenplace.net/images/math/0253292e2f5a5fa793785fbffcff4bb7cef94af7.svg" style="height: 13px;" type="image/svg+xml"&gt;k=7&lt;/object&gt; hash functions to
achieve this optimal error rate.&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;For this reason, Bloom filters are very common in data storage systems.
Here's &lt;a class="reference external" href="https://stackoverflow.com/a/39331778/8206"&gt;a discussion about Cassandra&lt;/a&gt;,
but there are many others.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
</content><category term="misc"></category><category term="Math"></category><category term="Go"></category></entry><entry><title>Implementing Raft: Part 5 - Exactly-once delivery</title><link href="https://eli.thegreenplace.net/2024/implementing-raft-part-5-exactly-once-delivery/" rel="alternate"></link><published>2024-12-18T06:01:00-08:00</published><updated>2024-12-18T23:51:34-08:00</updated><author><name>Eli Bendersky</name></author><id>tag:eli.thegreenplace.net,2024-12-18:/2024/implementing-raft-part-5-exactly-once-delivery/</id><summary type="html">&lt;p&gt;This is Part 5 in a series of posts describing the Raft distributed consensus
algorithm and its complete implementation in Go. Here is a list of posts in the
series:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="https://eli.thegreenplace.net/2020/implementing-raft-part-0-introduction/"&gt;Part 0: Introduction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://eli.thegreenplace.net/2020/implementing-raft-part-1-elections/"&gt;Part 1: Elections&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://eli.thegreenplace.net/2020/implementing-raft-part-2-commands-and-log-replication/"&gt;Part 2: Commands and log replication&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://eli.thegreenplace.net/2020/implementing-raft-part-3-persistence-and-optimizations/"&gt;Part 3: Persistence and optimizations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://eli.thegreenplace.net/2024/implementing-raft-part-4-keyvalue-database/"&gt;Part 4: Key …&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;</summary><content type="html">&lt;p&gt;This is Part 5 in a series of posts describing the Raft distributed consensus
algorithm and its complete implementation in Go. Here is a list of posts in the
series:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;a class="reference external" href="https://eli.thegreenplace.net/2020/implementing-raft-part-0-introduction/"&gt;Part 0: Introduction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://eli.thegreenplace.net/2020/implementing-raft-part-1-elections/"&gt;Part 1: Elections&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://eli.thegreenplace.net/2020/implementing-raft-part-2-commands-and-log-replication/"&gt;Part 2: Commands and log replication&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://eli.thegreenplace.net/2020/implementing-raft-part-3-persistence-and-optimizations/"&gt;Part 3: Persistence and optimizations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class="reference external" href="https://eli.thegreenplace.net/2024/implementing-raft-part-4-keyvalue-database/"&gt;Part 4: Key/Value database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Part 5: Exactly-once delivery (this post)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In this part, we're completing the implementation of a replicated key / value
database based on Raft consensus. At the end of Part 4 we discussed a
consistency issue that may arise due to client retry logic; now is the time
to address it.&lt;/p&gt;
&lt;p&gt;All the code for this part is located
in &lt;a class="reference external" href="https://github.com/eliben/raft/tree/main/part5kv"&gt;this directory&lt;/a&gt;.&lt;/p&gt;
&lt;div class="section" id="adding-an-append-operation-to-our-database"&gt;
&lt;h2&gt;Adding an &lt;tt class="docutils literal"&gt;APPEND&lt;/tt&gt; operation to our database&lt;/h2&gt;
&lt;p&gt;As a quick reminder, these are the basic operations our KV DB from part 4
supports:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;tt class="docutils literal"&gt;PUT(k,v)&lt;/tt&gt;: assign value &lt;tt class="docutils literal"&gt;v&lt;/tt&gt; to key &lt;tt class="docutils literal"&gt;k&lt;/tt&gt;&lt;/li&gt;
&lt;li&gt;&lt;tt class="docutils literal"&gt;GET(k)&lt;/tt&gt;: retrieve the value associated with key &lt;tt class="docutils literal"&gt;k&lt;/tt&gt;&lt;/li&gt;
&lt;li&gt;&lt;tt class="docutils literal"&gt;CAS(k, cmp, v)&lt;/tt&gt;: atomic compare-and-swap. First, it reads &lt;tt class="docutils literal"&gt;curV&lt;/tt&gt; - the
current value associated with key &lt;tt class="docutils literal"&gt;k&lt;/tt&gt;. If &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;curV==cmp&lt;/span&gt;&lt;/tt&gt;, assigns value
&lt;tt class="docutils literal"&gt;v&lt;/tt&gt; to &lt;tt class="docutils literal"&gt;k&lt;/tt&gt; instead; otherwise, it's a no-op. In any case, &lt;tt class="docutils literal"&gt;curV&lt;/tt&gt; is
returned.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let's add another operation to this set; this is &lt;tt class="docutils literal"&gt;APPEND(k,v)&lt;/tt&gt;, which appends
&lt;tt class="docutils literal"&gt;v&lt;/tt&gt; to the value of key &lt;tt class="docutils literal"&gt;k&lt;/tt&gt; (in our implementation, keys and values are
both arbitrary Go strings); if there was no &lt;tt class="docutils literal"&gt;k&lt;/tt&gt; in the DB before this
operation, it behaves like &lt;tt class="docutils literal"&gt;PUT(k,v)&lt;/tt&gt;.&lt;/p&gt;
&lt;p&gt;For example, consider this sequence of commands (in order from left to right):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;PUT(&amp;quot;x&amp;quot;,&amp;quot;foo&amp;quot;)  APPEND(&amp;quot;x&amp;quot;, &amp;quot;bar&amp;quot;)  APPEND(&amp;quot;y&amp;quot;,&amp;quot;hello&amp;quot;)
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Applied to an empty DB, these commands will result in these keys / values:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;x=foobar
y=hello
&lt;/pre&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;div class="section" id="the-problem-with-client-retries-demonstrated-with-append"&gt;
&lt;h2&gt;The problem with client retries - demonstrated with &lt;tt class="docutils literal"&gt;APPEND&lt;/tt&gt;&lt;/h2&gt;
&lt;p&gt;The way our KV client works is described in detail in
&lt;a class="reference external" href="https://eli.thegreenplace.net/2024/implementing-raft-part-4-keyvalue-database/"&gt;Part 4&lt;/a&gt;. As a reminder,
the client tries the KV services one by one, submitting a command to them until
it gets a success response from a leader. The client also remembers which
service was the leader the last time it tried, to avoid wasting time on the
search next time.&lt;/p&gt;
&lt;p&gt;Suppose we've already submitted &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;PUT(&amp;quot;x&amp;quot;,&amp;quot;foo&amp;quot;)&lt;/span&gt;&lt;/tt&gt; successfully to the database,
and now we want to send the &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;APPEND(&amp;quot;x&amp;quot;,&amp;quot;bar&amp;quot;)&lt;/span&gt;&lt;/tt&gt; command. Suppose also that
our client remembers that service B was the leader (in a cluster of three
services: A, B and C). It sends the &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;APPEND(&amp;quot;x&amp;quot;,&amp;quot;bar&amp;quot;)&lt;/span&gt;&lt;/tt&gt; request to service
B. What happens if the client doesn't get a response from B? It assumes
something happened to B (maybe it has crashed or was partitioned from the
network), and retries the same request - sending it to C.&lt;/p&gt;
&lt;p&gt;But now suppose that B actually received the APPEND command and committed it
to the Raft cluster, but crashed before sending the HTTP response back to the
client (or maybe the HTTP response got delayed beyond the client's timeout,
due to a network glitch). Due to the same error, B then loses cluster leadership.
The client will keep retrying this request, until
it finds a leader that answers with success; therefore, the APPEND may
be applied twice (or even more times, if the failure mode recurs) and the value
of &lt;tt class="docutils literal"&gt;x&lt;/tt&gt; in the DB will end up being &lt;tt class="docutils literal"&gt;&amp;quot;foobarbar&amp;quot;&lt;/tt&gt;. This is bad!&lt;/p&gt;
&lt;p&gt;You may be tempted to blame the client's retry behavior here; but let's think
this through. Suppose we didn't have the client layer doing retries; we send
an APPEND command to a service, and don't hear anything back. What do we do
next? Is there any way to know that the request was actually committed? Well,
we can send a GET request to check, but this quickly gets complicated in a real
distributed system, because our operation is no longer atomic (some other client
may have changed the key's value since then, so what is our GET supposed to
check?).&lt;/p&gt;
&lt;p&gt;The problem isn't the retry itself; it's retrying with insufficient safety
guarantees in the core algorithm.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="solving-the-retry-problem-with-command-de-duplication"&gt;
&lt;h2&gt;Solving the retry problem with command de-duplication&lt;/h2&gt;
&lt;p&gt;The problem described above isn't just an issue in our implementation. It's
explicitly called out in the original Raft paper, in section 8:&lt;/p&gt;
&lt;blockquote&gt;
However, as described so far Raft can execute a command multiple times: for
example, if the leader crashes after committing the log entry but before
responding to the client, the client will retry the command with a new
leader, causing it to be executed a second time. The solution is for clients
to assign unique serial numbers to every command. Then, the state machine
tracks the latest serial number processed for each client, along with the
associated response. If it receives a command whose serial number has already
been executed, it responds immediately without re-executing the request.&lt;/blockquote&gt;
&lt;p&gt;The paper also suggests a solution to the problem, and this is what we're
going to implement. If we can uniquely identify commands committed to the
Raft log, the KV service can avoid applying the same commands twice.&lt;/p&gt;
&lt;p&gt;The idea is to identify commands uniquely as follows:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;Each client has a globally unique ID&lt;/li&gt;
&lt;li&gt;Each command sent by a client has a unique ID, distinct from all other IDs
sent by the client. Moreover, to make our algorithm efficient this ID is
monotonically increasing; assuming each client has its own monotonic clock,
a command sent at time &lt;object class="valign-m4" data="https://eli.thegreenplace.net/images/math/2885fa41d340ab94bb0451308cf01996f1916011.svg" style="height: 16px;" type="image/svg+xml"&gt;T_1&lt;/object&gt; will have an ID larger than a command
sent at &lt;object class="valign-m3" data="https://eli.thegreenplace.net/images/math/9aa7b71de925cf0af7c914ee94140a522d6b84ac.svg" style="height: 15px;" type="image/svg+xml"&gt;T_0&lt;/object&gt; iff &lt;object class="valign-m3" data="https://eli.thegreenplace.net/images/math/8967f572b209645f22cd7dbd1eef806527ee6c90.svg" style="height: 15px;" type="image/svg+xml"&gt;T_1&amp;gt;T_0&lt;/object&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In our APPEND example, let's say the client's ID is 42, and let's say the ID
of the &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;APPEND(&amp;quot;x&amp;quot;,&amp;quot;bar&amp;quot;)&lt;/span&gt;&lt;/tt&gt; command it sends is 1. The command is sent to B,
which commits it successfully - with the ID tuple &lt;tt class="docutils literal"&gt;(42,1)&lt;/tt&gt; - to the Raft log;
B crashes before responding to the client, so the client retries the command
with C. Since it's the same command, it has the same ID tuple &lt;tt class="docutils literal"&gt;(42,1)&lt;/tt&gt;. The KV
service in C will notice that such an ID was already applied, and will not apply
it again &lt;a class="footnote-reference" href="#footnote-1" id="footnote-reference-1"&gt;[1]&lt;/a&gt;!&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="implementing-de-duplication"&gt;
&lt;h2&gt;Implementing de-duplication&lt;/h2&gt;
&lt;p&gt;We'll start with the client. Two fields are added to the &lt;tt class="docutils literal"&gt;KVClient&lt;/tt&gt; struct:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;// clientID is a unique identifier for a client; it&amp;#39;s managed internally&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;// in this file by incrementing the clientCount global.&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="nx"&gt;clientID&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;int64&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="c1"&gt;// requestID is a unique identifier for a request a specific client makes;&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;// each client manages its own requestID, and increments it monotonically and&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;// atomically each time the user asks to send a new request.&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="nx"&gt;requestID&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;atomic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Int64&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;tt class="docutils literal"&gt;clientID&lt;/tt&gt; field is assigned when a client is created, using a global
atomic that auto-increments:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;serviceAddrs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;KVClient&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;KVClient&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="c1"&gt;// ... other fields&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;clientID&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nx"&gt;clientCount&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="c1"&gt;// clientCount is used to assign unique identifiers to distinct clients.&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;clientCount&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;atomic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Int64&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This provides uniqueness for our tests, but in realistic applications you'll
probably want something stronger. A simple and pragmatic
approach could be using an &lt;a class="reference external" href="https://en.wikipedia.org/wiki/Universally_unique_identifier"&gt;UUID&lt;/a&gt; here - I'll
leave this as an exercise for motivated readers.&lt;/p&gt;
&lt;p&gt;The &lt;tt class="docutils literal"&gt;requestID&lt;/tt&gt; field tracks the ID of the last request this client has
sent. Each time the client sends a new request, this is incremented &lt;a class="footnote-reference" href="#footnote-2" id="footnote-reference-2"&gt;[2]&lt;/a&gt;.
For example, here's the new &lt;tt class="docutils literal"&gt;Append&lt;/tt&gt; method:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;// Append the value to the key in the store. Returns an error, or&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;// (prevValue, keyFound, false), where keyFound specifies whether the key was&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;// found in the store prior to this command, and prevValue is its previous&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;// value if it was found.&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;KVClient&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;appendReq&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;AppendRequest&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="hll"&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;ClientID&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;clientID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;span class="hll"&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;RequestID&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;requestID&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;appendResp&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;AppendResponse&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;append&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;appendReq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;appendResp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;appendResp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PrevValue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;appendResp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;KeyFound&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Both IDs are part of the HTTP request sent to service. Here's the request
struct for appends:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;AppendRequest&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;Key&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;Value&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;ClientID&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="kt"&gt;int64&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;RequestID&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;int64&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The other commands are modified similarly; all the retry logic (in the &lt;tt class="docutils literal"&gt;send&lt;/tt&gt;
method) remains the same - it just keeps retrying with the same client+request
IDs.&lt;/p&gt;
&lt;p&gt;The changes in the service are slightly deeper, but not too difficult overall.
First, we add a field to the &lt;tt class="docutils literal"&gt;KVService&lt;/tt&gt; struct:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;// lastRequestIDPerClient helps de-duplicate client requests. It stores the&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;// last request ID that was applied by the updater per client; the assumption&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;// is that client IDs are unique (keys in this map), and for each client the&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;// requests IDs (values in this map) are unique and monotonically increasing.&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="nx"&gt;lastRequestIDPerClient&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;map&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;int64&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="kt"&gt;int64&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We also add some fields to the &lt;tt class="docutils literal"&gt;Command&lt;/tt&gt; struct:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="c1"&gt;// ClientID and RequestID uniquely identify the request+client.&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="nx"&gt;ClientID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;RequestID&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;int64&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="c1"&gt;// IsDuplicate is used to mark the command as a duplicate by the updater. When&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;// the updater notices a command that has a client+request ID that has already&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;// been executed, the command is not applied to the datastore; instead,&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="c1"&gt;// IsDuplicate is set to true.&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="nx"&gt;IsDuplicate&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As a reminder, &lt;tt class="docutils literal"&gt;Command&lt;/tt&gt; is the &amp;quot;payload&amp;quot; we submit to the Raft log &lt;a class="footnote-reference" href="#footnote-3" id="footnote-reference-3"&gt;[3]&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The bulk of the logic is in the goroutine running &lt;tt class="docutils literal"&gt;runUpdater&lt;/tt&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;kvs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;KVService&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;runUpdater&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;go&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;entry&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;range&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;kvs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;commitChan&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;.(&lt;/span&gt;&lt;span class="nx"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;// Duplicate command detection.&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;// Only accept this request if its ID is higher than the last request from&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;// this client.&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nx"&gt;lastReqID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;kvs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lastRequestIDPerClient&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ClientID&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;lastReqID&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;RequestID&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nx"&gt;kvs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;kvlog&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;duplicate request id=%v, from client id=%v&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;RequestID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ClientID&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="c1"&gt;// Duplicate: this request ID was already applied in the past!&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nx"&gt;Kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Kind&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nx"&gt;IsDuplicate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nx"&gt;kvs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lastRequestIDPerClient&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ClientID&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;RequestID&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;switch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Kind&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;CommandGet&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ResultValue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ResultFound&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;kvs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;CommandPut&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ResultValue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ResultFound&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;kvs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;CommandAppend&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ResultValue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ResultFound&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;kvs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;CommandCAS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ResultValue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ResultFound&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;kvs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CAS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CompareValue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="nb"&gt;panic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Errorf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;unexpected command %v&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;// Forward this command to the subscriber interested in its index, and&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;// close the subscription - it&amp;#39;s single-use.&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;sub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;kvs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;popCommitSubscription&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;sub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nx"&gt;sub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nb"&gt;close&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}()&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This relies on the IDs from each client being monotonically increasing, and thus
we only have to maintain O(1) of state per client &lt;a class="footnote-reference" href="#footnote-4" id="footnote-reference-4"&gt;[4]&lt;/a&gt;. When we apply a client
request to the state machine, we remember the ID of this request. If the code
is ever asked to apply the same ID (or a lower ID), it refuses, marking the
command as &lt;tt class="docutils literal"&gt;IsDuplicate=true&lt;/tt&gt; instead. Then, the HTTP handler that tried to
submit the command has to deal with this situation; for example,
in &lt;tt class="docutils literal"&gt;handleAppend&lt;/tt&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="nx"&gt;sub&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;kvs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;createCommitSubscription&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logIndex&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;

&lt;span class="k"&gt;select&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;commitCmd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="nx"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;commitCmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ServiceID&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;kvs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;commitCmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;IsDuplicate&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nx"&gt;kvs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sendHTTPResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;AppendResponse&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nx"&gt;RespStatus&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StatusDuplicateRequest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nx"&gt;kvs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sendHTTPResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;AppendResponse&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nx"&gt;RespStatus&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StatusOK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nx"&gt;KeyFound&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="nx"&gt;commitCmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ResultFound&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nx"&gt;PrevValue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;commitCmd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ResultValue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;kvs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sendHTTPResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;w&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;AppendResponse&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;RespStatus&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StatusFailedCommit&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;lt;-&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nx"&gt;Done&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;For duplicates the service returns a special API status:
&lt;tt class="docutils literal"&gt;api.StatusDuplicateRequest&lt;/tt&gt;. Our client treats this as an error and surfaces
it to the user. As an exercise, try changing it so the return value from
duplicates is normal (success). The challenge here is to record - for each
request - what the result is for returning to the client (e.g. the previous
value of a key in case of PUT or APPEND).&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="revisiting-our-consistency-guarantees"&gt;
&lt;h2&gt;Revisiting our consistency guarantees&lt;/h2&gt;
&lt;p&gt;In &lt;a class="reference external" href="https://eli.thegreenplace.net/2024/implementing-raft-part-4-keyvalue-database/"&gt;Part 4&lt;/a&gt;,
we've discussed the consistency guarantees of our KV service in detail, and
concluded that it's strict serializable, which is the strongest consistency
guarantee for distributed systems.&lt;/p&gt;
&lt;p&gt;However, when adding the client module we've also noted that - due to the client
retry problem - the whole system is no longer &lt;a class="reference external" href="https://eli.thegreenplace.net/2024/linearizability-in-distributed-systems/"&gt;linearizable&lt;/a&gt;
(and hence no longer strict serializable). Linearizability extended to the
client is known to be tricky; this isn't surprising - after all, the client
is yet another network-connected component in the system, with inherently
unreliable communication to the service.&lt;/p&gt;
&lt;p&gt;With de-duplication, our entire system is strict-serializable again. Even if
a client re-sends a command that was already committed to the Raft log, this
retried command won't be committed a second time due to the de-duplication
logic. Users will not observe non-linearizable behavior.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="exactly-once-delivery"&gt;
&lt;h2&gt;Exactly-once delivery&lt;/h2&gt;
&lt;p&gt;The discussion around delivery semantics can get quite heated - people have
strong feelings about a topic in which academic rigor is often insufficiently
applied. Still, I'll take the chance of discussing these semantics in the
context of the Raft-based KV DB we've built through this series of posts.&lt;/p&gt;
&lt;p&gt;Consider a case in which we have no client retries. A client submits a user's
request just once to the underlying Raft service. Raft guarantees that if the
system is sufficiently connected (e.g. there are enough live and connected
peer to achieve consensus), the command will be applied to the Raft log once.
The Raft protocol is both theoretically and practically proven at this point,
so let's take this as an axiom. So we can say that without client retries,
a command is delivered &lt;em&gt;at most once&lt;/em&gt;. It can fail to be delivered (if there's
a persistent network partition preventing the Raft cluster from having enough
live peers for consensus), but it will not be delivered more than once.&lt;/p&gt;
&lt;p&gt;For some applications, at most once delivery is a sufficient guarantee. Think
about some sort of distributed logging or telemetry, for example. This isn't
the case for the kind of KV DB we're trying to build, though, because it's
intended to serve as a rock-solid basis for other distributed applications.&lt;/p&gt;
&lt;p&gt;Next we've added client retries; this is essential in light of the imperfect
physical world in which our code operates. With client retries, assuming the
HW and network is working &amp;quot;in the long term&amp;quot; (e.g. network partitions get
fixed within some reasonable time, and crashing servers get restarted or
replaced), we get &lt;em&gt;at least once&lt;/em&gt; semantics. The client will just keep retrying
until it gets notified that the command was applied. However, as we've seen at
the beginning of this post, this also means duplicate delivery when some failure
scenarios occur.&lt;/p&gt;
&lt;p&gt;The goal of adding de-duplication is to move our system to &lt;em&gt;exactly-once&lt;/em&gt;
delivery. Exactly-once delivery is a highly debated topic, but with some
reasonable real-world assumptions, it can be achieved. Consider our implementation,
for example (with de-duplication of retries). As long as the network and HW are
not permanently broken, a command will either be applied &lt;em&gt;exactly once&lt;/em&gt; to the
DB, or the client will be notified of an error.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="http://blog.rongarret.info/2024/09/yes-you-can-have-exactly-once-delivery.html"&gt;Here's a post by Ron Garret&lt;/a&gt; where
he provides a useful framework to think about this. The post is long but worth
reading; here's a quote I liked:&lt;/p&gt;
&lt;blockquote&gt;
If you can get at-least-once delivery, you can build exactly-once on top of that&lt;/blockquote&gt;
&lt;p&gt;I believe this Raft series has demonstrated how this is done.
Another interesting read is &lt;a class="reference external" href="https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/"&gt;this post by the Kafka developers&lt;/a&gt;
where they discuss how exactly-once semantics were added to Kafka a few years
ago. If you read it you'll see it's pretty much the same technique - client
retries with de-duplication of commands in the distributed log.&lt;/p&gt;
&lt;p&gt;Finally, in &lt;em&gt;Designing Data-Intensive Applications&lt;/em&gt;, Martin Kleppman touches
upon this topic in several places; for example, in chapter 12 he discusses
exactly-once execution of operations and the engineering required to make it
possible.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="conclusion"&gt;
&lt;h2&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;This concludes our series of posts on implementing a full Raft consensus module
in Go, and building a strictly serializable KV DB on top of it.&lt;/p&gt;
&lt;p&gt;For any questions or comments about these posts or the code, please send me an
email or open an &lt;a class="reference external" href="https://github.com/eliben/raft/issues"&gt;issue on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="appendix-1-retries-with-put-and-linearizability"&gt;
&lt;h2&gt;Appendix 1: Retries with PUT and linearizability&lt;/h2&gt;
&lt;p&gt;In this post we've added an APPEND operation to demonstrate the issues that
arise with retries and duplication. But APPEND differs significantly
from the original operations we've discussed in part 4: it's not idempotent.
Applying APPEND more than once creates an invalid value, which isn't true
for PUT - if we had retried PUT multiple times, the only outcome would be
writing the result twice.&lt;/p&gt;
&lt;p&gt;While this seems logical on the surface, it turns out that even without operations
like APPEND, the system isn't linearizable if retries without de-duplication are
allowed. Here's a diagram (following the style of
&lt;a class="reference external" href="https://eli.thegreenplace.net/2024/linearizability-in-distributed-systems/"&gt;my post on linarizability&lt;/a&gt;)
that demonstrates the issue:&lt;/p&gt;
&lt;img alt="Linearazibility diagram of PUTs with retries" class="align-center" src="https://eli.thegreenplace.net/images/2024/retry-put-not-linearizable.png" /&gt;
&lt;p&gt;This diagram describes a single &amp;quot;register&amp;quot;, let's assume the key &amp;quot;foo&amp;quot;. Let's
also assume that before any writes, the default value of all keys is 0. Here's
what happens:&lt;/p&gt;
&lt;ol class="arabic simple"&gt;
&lt;li&gt;Client A issues a &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;PUT('foo',1)&lt;/span&gt;&lt;/tt&gt; request. The request gets committed by
the leader, which crashes before a reply is made to the client. The client
will continue periodically retrying.&lt;/li&gt;
&lt;li&gt;In the meantime, a new leader is elected. Client C reads the value of the
register and gets a successful response (since the PUT of 1 was committed).&lt;/li&gt;
&lt;li&gt;Client B uses the new leader to commit a new PUT, with value 2.&lt;/li&gt;
&lt;li&gt;Client C reads the value of 2 from the register.&lt;/li&gt;
&lt;li&gt;Client A's retry reaches the new leader, which commits another instance
of &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;PUT('foo',1)&lt;/span&gt;&lt;/tt&gt;, overwriting
the value 2. Now when client C reads the register again, it gets 1 again.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This sequence isn't linearizable; as far as the clients are concerned, since
operation (2) retrieved the value 1, it means (1) finished before (2); the
result of (4) also means that (3) happened after (1). This
makes (5) impossible in a linearizable system.&lt;/p&gt;
&lt;p&gt;This is a known failure scenario in distributed systems; it's called &lt;em&gt;lost
update&lt;/em&gt;. In our example, the &lt;tt class="docutils literal"&gt;PUT(2)&lt;/tt&gt; operation is essentially lost outside a
brief window just following it. The retry of &lt;tt class="docutils literal"&gt;PUT(1)&lt;/tt&gt; overwrites it.&lt;/p&gt;
&lt;/div&gt;
&lt;div class="section" id="appendix-2-lost-updates-in-etcd-with-client-retries"&gt;
&lt;h2&gt;Appendix 2: lost updates in etcd with client retries&lt;/h2&gt;
&lt;p&gt;To demonstrate that the issue discussed in this post isn't purely academic,
here's a real world example.&lt;/p&gt;
&lt;p&gt;&lt;a class="reference external" href="https://etcd.io/"&gt;etcd&lt;/a&gt; is an industrial-strength key value DB based on
Raft. It's used extensively inside k8s and other projects. While the etcd
service itself is strict serializable, some of its client libraries turned
out to have the exact retry problem described in this post.&lt;/p&gt;
&lt;p&gt;Here's a &lt;a class="reference external" href="https://jepsen.io/analyses/jetcd-0.8.2"&gt;Jepsen analysis of jetcd&lt;/a&gt;,
a Java-based client that automatically retries on failures. The analysis
concludes that this mechanism results in loss of linearizability, and recommends
to disable it. There's also &lt;a class="reference external" href="https://github.com/etcd-io/etcd/issues/14890"&gt;a lengthy discussion in a GitHub issue&lt;/a&gt; with the
etcd developers about this. And &lt;a class="reference external" href="https://github.com/etcd-io/etcd/issues/18424"&gt;another issue&lt;/a&gt; has interesting
information as well.&lt;/p&gt;
&lt;p&gt;The Jepsen analysis has a great quote which I want to repost here, because it's
so relevant to our discussion (and Appendix 1 in particular):&lt;/p&gt;
&lt;blockquote&gt;
It is easy to assume that set(x, 5) is idempotent because applying it twice in
a row still produces the state x = 5. However, this operation is not longer
idempotent if its executions are interleaved with other writes—then, it leads to
lost update.&lt;/blockquote&gt;
&lt;p&gt;Interestingly, etcd doesn't support operations like APPEND at all. It can be
emulated with transactions, however, since etcd's &lt;a class="reference external" href="https://etcd.io/docs/v3.5/learning/api/#revisions"&gt;data store is versioned&lt;/a&gt;.
These features also allow one to be more careful around failures when
non-linearizable behavior isn't acceptable. For example, we can perform writes
like this:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;1. Read revision --&amp;gt; $rev
2. TXN
      if mod_revision(k) == $rev
      PUT(k, v)
&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This will only assign &lt;tt class="docutils literal"&gt;&lt;span class="pre"&gt;store(k)=v&lt;/span&gt;&lt;/tt&gt; if there were no changes to the DB in the
meantime. The failure scenario shown in Appendix 1 cannot happen, because each
successful write increments the revision of the store.
If unsuccessful, this sequence of operations can be retried safely
(its only problem is with liveness, not safety).&lt;/p&gt;
&lt;hr class="docutils" /&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-1" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-1"&gt;[1]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;For this reason it's important to store the ID of the command
&lt;em&gt;in the Raft log&lt;/em&gt; along with the command itself - we need this
de-duplication to work across peers.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-2" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-2"&gt;[2]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;If you're concerned that a client may send more than 9.2 quintillion
requests in its lifetime, this is easy to change to an arbitrarily
large number using &lt;tt class="docutils literal"&gt;math/big.Int&lt;/tt&gt;.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-3" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-3"&gt;[3]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;And we also reuse it to communicate results back from the updater
goroutine to request handlers - this is why &lt;tt class="docutils literal"&gt;IsDuplicate&lt;/tt&gt; is there.
A simple refactoring exercise could be to use a separate data structure
for this purpose.&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table class="docutils footnote" frame="void" id="footnote-4" rules="none"&gt;
&lt;colgroup&gt;&lt;col class="label" /&gt;&lt;col /&gt;&lt;/colgroup&gt;
&lt;tbody valign="top"&gt;
&lt;tr&gt;&lt;td class="label"&gt;&lt;a class="fn-backref" href="#footnote-reference-4"&gt;[4]&lt;/a&gt;&lt;/td&gt;&lt;td&gt;This assumes the total number
of distinct clients this service is dealing with is not overly large; in
realistic systems, this is a reasonable assumption. In extreme cases,
if we foresee having to deal with an unbounded number of clients,
some sort of &amp;quot;garbage collection&amp;quot; scheme should be maintained (we can
&amp;quot;forget&amp;quot; a client after some timeout of not hearing from it).&lt;/td&gt;&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
</content><category term="misc"></category><category term="Concurrency"></category><category term="Go"></category><category term="Network Programming"></category></entry></feed>