Eli Bendersky's website

Calculating the norm of a complex number

2024-10-17T19:45:00-07:00

In this quick post I'll dispel a common confusion in the basic math of complex numbers. It's often useful to calculate the norm-square (also known as absolute square) of a complex number z. This norm-square is denoted |z|^2. One could naively expect that:

\[|z|^2=zz\]

However, that's false! The way to calculate norm-square is:

\[|z|^2=zz^*\]

Where z^* is the complex conjugate of z. But why? To understand the mechanics of this calculation, let's start by defining what a norm is.

The norm of a complex number

Informally, a norm is a generalization of the concept of "length" or "size". For a real number r, the norm is its absolute value |r|. No matter if the number is positive or negative, the norm is its "length" - the distance from the origin.

A norm is defined similarly for complex numbers. Here's a standard geometrical interpretation of a complex number z, showing both cartesian and polar coordinates:

The "norm" of z is the length of the blue line, or the distance of its endpoint from the origin (in the polar representationz=re^{i\theta}, it's exactly r). The norm of a complex number uses the same notation as the absolute value: |z|, because it means the same thing. Another common name for the norm of complex numbers is modulus.

Norm: a formal definition

The formal definition of a norm (from Wikipedia) talks about generalized vector spaces. Since complex numbers are also a vector space (of dimension 1), we can simplify the definition just for \mathbb{C} as follows:

A norm on \mathbb{C} is a real-valued function p:\mathbb{C}\rightarrow\mathbb{R} with the following properties:

Triangle inequality: p(z+w)\leq p(z)+p(w) for all z,w\in \mathbb{C}
Absolute homogeneity: p(sz)=|s|p(z) for all z\in \mathbb{C} and all s\in \mathbb{R}
Positive definiteness: for all z\in \mathbb{C} if p(z)=0 then z=0.
Non-negativity: p(z)\geq 0 for all z\in \mathbb{C}

For a complex number z, setting p(z)=|z| makes all these properties work out intuitively:

Graphically, complex numbers add like vectors; therefore, given z and w, the length |z+w| is always at most the combined lengths |z|+|w|.
When we scale z by a scalar s, its length is scaled similarly. Think about the polar representation of a complex number: z=re^{i\theta}. Multiplying that by s, we get sz=sre^{i\theta}, which has length sr.
In the polar representation: if |z|=0, then r=0, meaning that z=0.
By definition, a length is non-negative; it's the distance from the origin.

Why z squared is not a norm-square

Now it's time to go back to the question we started the post with. Why isn't zz (or z^2) the norm-square?

For a general z=x+iy, we can calculate zz as follows:

\[\begin{align*} zz&=(x+iy)(x+iy)\\ &=x^2+iyx+ixy-y^2\\ &=x^2-y^2+2ixy \end{align*}\]

This is clearly a complex number, with a real and an imaginary component. It doesn't seem to fit the requirement of being a "length" or distance from the origin, which we'd expect to be a real quantity.

Multiplying z by itself in polar coordinates can be insightful:

\[\begin{align*} zz&=re^{i\theta}re^{i\theta}\\ &=rre^{i\theta+i\theta}\\ &=r^2 e^{i\cdot 2\theta} \end{align*}\]

It seems like we almost get what we want, because the magnitude of zz is r^2, which seems like the right magnitude for the square of the distance from origin; but there's still an angle... Here's how it looks geometrically:

Recall that when multiplying two complex numbers - their magnitudes multiply, but their angles add. In this case, we got the r^2 we needed, but with an angle of 2\theta, which is not what we need. If only we could get rid of the angle somehow... keep this thought in your head for the next section.

Looking at the formal definition of the norm, it's clear right away that zz won't do. The norm is defined as a real-valued function, whereas zz is not real-valued.

zz* is a norm-square

The conjugate z^* of z is defined as:

\[z^*=x-iy\]

Or in polar form:

\[z^*=re^{-i\theta}\]

Recall how we had that pesky angle remaining when calculating zz? Let's find a way to get rid of it; since angles add when we multiply complex numbers, to get rid of we should multiply z by something with an angle of -\theta. Do you see where this is going?

\[zz^*=re^{i\theta}re^{-i\theta}=r^2e^{i(\theta-\theta)}=r^2\]

Voila! Multiplying z by its complex conjugate gives us a real number. Moreover, it gives us exactly the number we want - r^2. This is the norm-square, or |z|^2. The norm of z is |z| and is precisely r, the "length" of the complex number.

Let's verify this works in cartesian coordinates:

\[\begin{align*} zz^*&=(x+iy)(x-iy)\\ &=x^2+iyx-ixy+y^2\\ &=x^2+y^2 \end{align*}\]

Indeed, this makes intuitive sense because:

\[|z|=\sqrt{zz*}=\sqrt{x^2+y^2}\]

And this is exactly what we expect when calculating the length of z.

Conclusion

The norm square of a complex number z is denoted |z|^2. In this post we've seen why

\[|z|^2\neq z^2\]

(even though the mathematical notation makes it seem like this should be true). Instead, this is how the norm-square is actually calculated:

\[|z|^2=zz^*\]

With the norm itself being

\[|z|=\sqrt{zz^*}\]

Appendix: zz* and the formal definition of norm

Let's get back to the formal definition of norm and show that zz* satisfies it.

First of all, we've already seen that zz* is a real number, no matter what z is. Therefore, it can indeed serve as a real-valued function p:\mathbb{C}\rightarrow\mathbb{R}.

The proof of property 1 is a bit tedious, but straightforward using the Cauchy-Schwartz inequality.

For property 2, since s is a real number, the square root of its square is just its absolute value:

\[|sz|=\sqrt{(sx)^2+(sy)^2}=\sqrt{s^2(x^2+y^2)}=|s|\sqrt{x^2+y^2}=|s||z|\]

For property 3, consider a z such that:

\[|z|=\sqrt{x^2+y^2}=0\]

Since neither addend inside the square root can be negative, clearly both x and y must be zero, meaning that z=0.

Property 4 is similarly straightforward: given real components x and y, |z|=\sqrt{x^2+y^2} is non-negative.

Implementing Raft: Part 4 - Key/Value Database

2024-10-10T19:50:00-07:00

This is Part 4 in a series of posts describing the Raft distributed consensus algorithm and its complete implementation in Go. Here is a list of posts in the series:

In this part, we're going to use our Raft module to implement a simple but realistic application - a replicated key / value database with strong consistency semantics. All the code for this part is located in this directory.

Key / value database as a state machine

First of all, what's a key / value database (KV DB)? Think of it as a Go map, or as an extremely simple version of NoSQL databases like Redis or CouchDB. The basic operations our KV DB supports are:

PUT(k,v): assign value v to key k
GET(k): retrieve the value associated with key k
CAS(k, cmp, v): atomic compare-and-swap. First, it reads curV - the current value associated with key k. If curV==cmp, assigns value v to k instead; otherwise, it's a no-op. In any case, curV is returned.

For example, suppose the commands in some Raft log are (in order from left to right):

PUT(x,2)  PUT(y,3)  PUT(x,4)  PUT(z,5)  CAS(x,4,8)  CAS(z,4,9)

Applied to an empty DB, this log will result in these keys / values:

x=8
y=3
z=5

System diagram

In this part we're going to build a complete KV DB system - including the service and a client library:

The diagram presents a cluster with 3 replicas [1]. Each replica is a KV DB service.
A KV service contains a Raft Consensus Module (the diagram doesn't show the log, assuming it's just part of the CM), and a data store module that implements the actual database.
The Raft CM of each replica is connected to the others via RPCs - these are the Raft protocol RPCs discussed extensively in previous parts.
The KV service presents a REST API to the external world; clients can send HTTP commands to the service and get results.
"KV Client" is a client library with a convenient API that encapsulates the HTTP interactions with KV services. This is also part of our demo, and we'll discuss it later in the post.

KV service architecture

The KV service consists of several key components:

An instance of a Raft server; as described back in Part 1, a Raft Server wraps a consensus module with some RPC scaffolding. In this part we reuse our final Raft server code from Part 3, without any modifications.
An underlying "data store". For our demonstration, a simple mutex-protected Go map will do; this is implemented in kvservice/datastore.go. This data store implements the Get, Put and CAS commands described earlier. All keys and values are Go strings (naturally, anything can be encoded in a string value).
An HTTP server for the REST API of the service exposed to the external world.

Commands

If you recall from Part 2, we submit new commands to the Raft cluster with the ConsensusModule.Submit method. A Command is an arbitrary any value; whenever the Raft cluster reaches consensus on a log entry, it sends a "commit entry" with this command on the commit channel. Commands are application-specific, and since we're working on a concrete application now, it's time to define our command for the KV service:

// Command is the concrete command type KVService submits to the Raft log to
// manage its state machine. It's also used to carry the results of the command
// after it's applied to the state machine. These are the supported commands:
//
// CommandGet: queries a key's value
//
// * Key is the key to get, Value is ignored
// * CompareValue is ignored
// * ResultFound is true iff Key was found in the store
// * ResultValue is the value, if Key was found in the store
//
// CommandPut: assigns value to the key
//
// * Key,Value are the pair to assign (store[key]=value)
// * CompareValue is ignored
// * ResultFound is true iff Key was previously found in the store
// * ResultValue is the old value of Key, if it was previously found
//
// CommandCAS: atomic compare-and-swap, performs:
//
//    if Store[Key] == CompareValue {
//      Store[Key] = Value
//    } else {
//      nop
//    }
//
// * Key is the key this command acts on
// * CompareValue is the previous value the command compares to
// * Value is the new value the command assigns
// * ResultFound is true iff Key was previously found in the store
// * ResultValue is the old value of Key, if it was previously found
type Command struct {
  Kind CommandKind

  Key, Value string

  CompareValue string

  ResultValue string
  ResultFound bool

  // id is the Raft ID of the server submitting this command.
  Id int
}

type CommandKind int

const (
  CommandInvalid CommandKind = iota
  CommandGet
  CommandPut
  CommandCAS
)

For simplicity, I chose to include fields for several commands in the same struct instead of using an algebraic data type here.

One important thing to note is that the service's Raft cluster ID is part of the command; it will soon become clear why this is needed.

Life of a PUT request to the service

Before we dive deep into the code, let's examine the journey a successful PUT request makes through the system:

A client sends a PUT("k", "v") request to a service, via HTTP. Let's assume it reaches the service which is currently the Raft cluster leader (we'll discuss what happens if it reaches a follower later on).
The service's HTTP handler receives the request, constructs a Command of kind CommandPut representing it and submits it to its Raft CM.
1. At this point, the HTTP handler waits; it can't reply to the client until it knows that the command was properly replicated to the Raft cluster and committed by the CM.
2. Once the command it submitted appears on the commit channel, the HTTP handler can return a success status to the client.
Meanwhile, a process in the service watches its commit channel for new commands that reached consensus by the cluster, and updates the underlying data store.
At the same time, the other services in the cluster - the followers - are also watching their commit channels and update their own replicas of the data store with the new PUT command.

Note that steps 2.2 and 3 happen concurrently. One process (in the sense of CSP) handles a client request, while another process takes care to execute commands arriving on the commit channel. In fact, there's more concurrency here than meets the eye. Our service can handle multiple concurrent requests, each with its own command - and it should all just work. This kind of concurrency is natural in Go - and now it's time to see how it works.

KV service code walk-through

All the code described in this section is located in kvservice/kvservice.go. Here's the struct defining the service:

type KVService struct {
  sync.Mutex

  // id is the service ID in a Raft cluster.
  id int

  // rs is the Raft server that contains a CM
  rs *raft.Server

  // commitChan is the commit channel passed to the Raft server; when commands
  // are committed, they're sent on this channel.
  commitChan chan raft.CommitEntry

  // commitSubs are the commit subscriptions currently active in this service.
  // See the createCommitSubsciption method for more details.
  commitSubs map[int]chan raft.CommitEntry

  // ds is the underlying data store implementing the KV DB.
  ds *DataStore

  // srv is the HTTP server exposed by the service to the external world.
  srv *http.Server
}

Don't worry about understanding exactly what each field means right now; note the correlation to the descriptions in "KV service architecture", though. A service holds a Raft server, a datastore, and an HTTP server. Other entities, like the commit channel, should be familiar by now.

A new service is created with this constructor:

// New creates a new KVService
//
//   - id: this service's ID within its Raft cluster
//   - peerIds: the IDs of the other Raft peers in the cluster
//   - storage: a raft.Storage implementation the service can use for
//     durable storage to persist its state.
//   - readyChan: notification channel that has to be closed when the Raft
//     cluster is ready (all peers are up and connected to each other).
func New(id int, peerIds []int, storage raft.Storage, readyChan <-chan any) *KVService {
  gob.Register(Command{})
  commitChan := make(chan raft.CommitEntry)

  // raft.Server handles the Raft RPCs in the cluster; after Serve is called,
  // it's ready to accept RPC connections from peers.
  rs := raft.NewServer(id, peerIds, storage, readyChan, commitChan)
  rs.Serve()
  kvs := &KVService{
    id:         id,
    rs:         rs,
    commitChan: commitChan,
    ds:         NewDataStore(),
    commitSubs: make(map[int]chan raft.CommitEntry),
  }

  kvs.runUpdater()
  return kvs
}

We'll get back to what runUpdater is a little later; for now, let's look at how the HTTP server is launched:

// ServeHTTP starts serving the KV REST API on the given TCP port. This
// function does not block; it fires up the HTTP server and returns. To properly
// shut down the server, call the Shutdown method.
func (kvs *KVService) ServeHTTP(port int) {
  if kvs.srv != nil {
    panic("ServeHTTP called with existing server")
  }
  mux := http.NewServeMux()
  mux.HandleFunc("POST /get/", kvs.handleGet)
  mux.HandleFunc("POST /put/", kvs.handlePut)
  mux.HandleFunc("POST /cas/", kvs.handleCAS)

  kvs.srv = &http.Server{
    Addr:    fmt.Sprintf(":%d", port),
    Handler: mux,
  }

  go func() {
    kvs.kvlog("serving HTTP on %s", kvs.srv.Addr)
    if err := kvs.srv.ListenAndServe(); err != http.ErrServerClosed {
      log.Fatal(err)
    }
    kvs.srv = nil
  }()
}

This should be familiar if you've written Go HTTP servers before. Listening is done in a goroutine to enable clean shutdown of the HTTP server specifically and the whole service in general; check out the Shutdown method for more details.

In the previous section, I mentioned that multiple HTTP requests can be handled concurrently; this is just the nature of the standard Go HTTP server. Here we see the handleXXX handlers registered with the server; each handler is invoked in a separate goroutine, and our code has to account for this. To understand what this means in practice, let's look at the updater goroutine.

// runUpdater runs the "updater" goroutine that reads the commit channel
// from Raft and updates the data store; this is the Replicated State Machine
// part of distributed consensus!
// It also notifies subscribers (registered with createCommitSubsciption).
func (kvs *KVService) runUpdater() {
  go func() {
    for entry := range kvs.commitChan {
      cmd := entry.Command.(Command)

      switch cmd.Kind {
      case CommandGet:
        cmd.ResultValue, cmd.ResultFound = kvs.ds.Get(cmd.Key)
      case CommandPut:
        cmd.ResultValue, cmd.ResultFound = kvs.ds.Put(cmd.Key, cmd.Value)
      case CommandCAS:
        cmd.ResultValue, cmd.ResultFound = kvs.ds.CAS(cmd.Key, cmd.CompareValue, cmd.Value)
      default:
        panic(fmt.Errorf("unexpected command %v", cmd))
      }

      // We're modifying the command to include results from the datastore,
      // so clone an entry with the update command for the subscribers.
      newEntry := raft.CommitEntry{
        Command: cmd,
        Index:   entry.Index,
        Term:    entry.Term,
      }

      // Forward this entry to the subscriber interested in its index, and
      // close the subscription - it's single-use.
      if sub := kvs.popCommitSubscription(entry.Index); sub != nil {
        sub <- newEntry
        close(sub)
      }
    }
  }()
}

The updater goroutine is responsible for implementing step (3) described in the "Life of..." section. It watches the commit channel for new committed commands, applies these commands to the datastore and then notifies "subscribers" about it. The first two tasks is what we'd expect from an implementation of a Raft-based replicated state machine; the last task needs some elaboration.

Recall step 2.1 from the "Life of..." section; once an HTTP handler submits a command to the Raft cluster, it has to wait and see if this command was properly committed. The way we implement it is:

The handler submits a command to the Raft CM, and keeps note of the log index the command is placed in.
The handler than registers a "subscription" with the updater, telling it: "hey, if you see a command submitted for this index, let me know". The subscription is implemented with a channel.
The handler can then wait on the channel.

Here's the code of handlePut, demonstrating this in action:

func (kvs *KVService) handlePut(w http.ResponseWriter, req *http.Request) {
  pr := &api.PutRequest{}
  if err := readRequestJSON(req, pr); err != nil {
    http.Error(w, err.Error(), http.StatusBadRequest)
    return
  }
  kvs.kvlog("HTTP PUT %v", pr)

  // Submit a command into the Raft server; this is the state change in the
  // replicated state machine built on top of the Raft log.
  cmd := Command{
    Kind:  CommandPut,
    Key:   pr.Key,
    Value: pr.Value,
    Id:    kvs.id,
  }
  logIndex := kvs.rs.Submit(cmd)
  // If we're not the Raft leader, send an appropriate status
  if logIndex < 0 {
    renderJSON(w, api.PutResponse{RespStatus: api.StatusNotLeader})
    return
  }

  // Subscribe for a commit update for our log index. Then wait for it to
  // be delivered.
  sub := kvs.createCommitSubsciption(logIndex)

  // Wait on the sub channel: the updater will deliver a value when the Raft
  // log has a commit at logIndex. To ensure clean shutdown of the service,
  // also select on the request context - if the request is canceled, this
  // handler aborts without sending data back to the client.
  select {
  case entry := <-sub:
    // If this is our command, all is good! If it's some other server's command,
    // this means we lost leadership at some point and should return an error
    // to the client.
    entryCmd := entry.Command.(Command)
    if entryCmd.Id == kvs.id {
      renderJSON(w, api.PutResponse{
        RespStatus: api.StatusOK,
        KeyFound:   entryCmd.ResultFound,
        PrevValue:  entryCmd.ResultValue,
      })
    } else {
      renderJSON(w, api.PutResponse{RespStatus: api.StatusFailedCommit})
    }
  case <-req.Context().Done():
    return
  }
}

The code is well-commented, but I want to specifically call out a few important points:

When kvs.rs.Submit is called with the command, it returns -1 if the current Raft CM is not the leader. In this case, we return a special status to the client - "I'm not the leader" - and abort the handler. We'll see what the client does about this further down in the post.

For a leader, Submit returns the log index at which the command was submitted. This is the index used to subscribe to notifications from the commit channel.
The handler waits on a receive on this channel. This can be canceled if the HTTP request is canceled by the client (e.g. timeout); otherwise, we just wait. In practice, with the optimizations in Part 3, it takes just a handful of milliseconds to fully commit new commands in a functioning Raft cluster. In case of problems (disconnections, crashes etc.) this may take longer, but our application prioritizes consistency over availability (see Part 0 on fault tolerance in Raft and the CAP theorem).
When notified that a commit was made for this log index, there's still an important safety check to make! Is it actually our command that was committed there? This is what the id field on the command is for.

Consider the following case: peer A is the leader, and a client submits a command. A places it in log index 42, but gets disconnected before it manages to tell followers about it. After a while, C becomes the new leader; C is unaware that A placed something in its log at index 42. Therefore, when C receives a new command from another client, it commits it at index 42 (since this is still the "next index for entries" for all connected cluster members). At some point later, A gets reconnected to the cluster, becomes a follower (since its term is out of date), and sees the commit from C at index 42. At this point it realizes that it failed to commit its own command (because the ID doesn't match), and replies with a "failed commit" status to the client.

I'll leave figuring out the mechanics of channel subscriptions to you as an exercise. Just read the createCommitSubscription and popCommitSubscription methods - they're fairly straightforward.

Consistency guarantees

I wrote in detail about linearizable semantics recently. Our KV service is linearizable based on that definition, due to the nature of Raft consensus. An operation only becomes visible to clients after it's committed; and it's committed by cluster consensus, at a "moment in time" relative to other operations in the Raft log.

Moreover, it's also serializable for transactions like CAS: these are performed by a single service (the leader) atomically, so clients can never observe the results of sub-operations in isolation.

By being both linearizable and serializable, our service is strict serializable, which is the strongest consistency guarantee for distributed systems.

As discussed before, this strong consistency comes at the expense of availability in the face of network partitions (as it must, due to the CAP theorem limits). It's a "CP" system; the following diagram is from Wikipedia:

What are such services good for? Though it can serve as a NoSQL database, it won't be very performant - every operation has to reach consensus among multiple peers before being considered "done". Instead, such strict serializable services are used as the very bottom layer of large distributed systems. For example, it can be used to coordinate distributed locks, elect leaders (these are fairly easy to build on top of our CAS primitive) or store some critical low-volume configuration data for a complex system.

Plumbing read-only operations through the Raft log

You'll note that all the commands our KV service supports - PUT, GET and CAS - are implemented fairly consistently and follow the sequence described in the "Life of..." section. This raises an important question: is this really necessary for the read-only GET operations? After all, they don't really change the state machine, so why add them as Raft log commands?

While it's true that a stray GET command won't harm the integrity of the internal data store, it may result in stale reads or other events inconsistent with the linearizable semantics of our service.

To see why, let's work by contradiction; assume we don't plumb GET through the Raft log, but instead let leaders immediately reply to GET requests based on their local datastore. Here's what can happen:

The KV DB has the key-value pair k=v.
A used to be a leader, but got disconnected from its peers; after a suitable election timeout, C was elected as the new leader. A still thinks it's the leader, however.
At some point, a client contacts C and submits PUT(k,v2). C successfully replicates this command to the remaining connected peers.
A bit later, another client sends GET(K) to C and gets the correct response v2.
Then, a different client sends GET(k) to A (perhaps the client remembered that the previous time it contacted the service, A was the leader [2]). Since A still thinks it's the leader, it will happily reply with the value v to the client's request.

This sequence of events breaks the linearizability guarantees of our service! The read GET(K) --> v is stale, since another client already read the value as v2. There is no single-threaded history in which this sequence of events is possible.

This problem is explicitly called out in Section 8 of the Raft paper. The canonical solution is what our service is doing: plumb all commands - even the read-only ones - through the Raft log [3]. A service won't respond to a client's request unless it was able to successfully commit this command to the Raft log.

Since we plumb GET commands through the Raft log, in our example the problem in the last step couldn't happen, because A would not respond to its client while disconnected from the cluster. Instead, it would have to wait to be reconnected, and at that point would discover that it's no longer the leader. The client would then ask the real leader and get the right response. However, even if due to additional disconnections or crashes A resumed leadership, it would have to process the PUT(k,v2) before processing the client's GET(k), since the state machine is updated in log order.

KV client

Now it's time to discuss the final piece of our system - the KV client library. Since the KV service API is just REST, we don't necessarily need a client library - we could just use curl calls or any other way to generate HTTP requests to interact with it. However, a convenient, idiomatic client library goes a long way in improving the quality of life of users - and it will be particularly useful in this case because it encodes some essential logic - finding and keeping track of the cluster leader.

So far, everything in our system has been replicated by N, which is the Raft cluster size (typically 3 or 5). The client is a single entity - just user code that wants to use the KV service. All the client code is in kvclient/kvclient.go; let's walk through how a single request works, starting with the type and constructor:

type KVClient struct {
  addrs []string

  // assumedLeader is the index (in addrs) of the service we assume is the
  // current leader. It is zero-initialized by default, without loss of
  // generality.
  assumedLeader int

  clientID int32
}

// New creates a new KVClient. serviceAddrs is the addresses (each a string
// with the format "host:port") of the services in the KVService cluster the
// client will contact.
func New(serviceAddrs []string) *KVClient {
  return &KVClient{
    addrs:         serviceAddrs,
    assumedLeader: 0,
    clientID:      clientCount.Add(1),
  }
}

// clientCount is used internally for debugging
var clientCount atomic.Int32

To create a client, we have to provide it with a list of addresses for the KV services that constitute a cluster; before the client sends its first request, the services should be launched and listening on these addresses.

All client requests follow the same steps; let's use Put as an example:

// Put the key=value pair into the store. Returns an error, or
// (prevValue, keyFound, false), where keyFound specifies whether the key was
// found in the store prior to this command, and prevValue is its previous
// value if it was found.
func (c *KVClient) Put(ctx context.Context, key string, value string) (string, bool, error) {
  putReq := api.PutRequest{
    Key:   key,
    Value: value,
  }
  var putResp api.PutResponse
  err := c.send(ctx, "put", putReq, &putResp)
  return putResp.PrevValue, putResp.KeyFound, err
}

Types like PutRequest and PutResponse are defined in api/api.go (you may have noticed them in the service code as well); they're trivial, so I won't spend more time on them.

All the client logic is encapsulated in the send method:

func (c *KVClient) send(ctx context.Context, route string, req any, resp api.Response) error {
  // This loop rotates through the list of service addresses until we get
  // a response that indicates we've found the leader of the cluster. It
  // starts at c.assumedLeader
FindLeader:
  for {
    // There's a two-level context tree here: we have the user context - ctx,
    // and we create our own context to impose a timeout on each request to
    // the service. If our timeout expires, we move on to try the next service.
    // In the meantime, we have to keep an eye on the user context - if that's
    // canceled at any time (due to timeout, explicit cancellation, etc), we
    // bail out.
    retryCtx, retryCtxCancel := context.WithTimeout(ctx, 50*time.Millisecond)
    path := fmt.Sprintf("http://%s/%s/", c.addrs[c.assumedLeader], route)

    c.clientlog("sending %#v to %v", req, path)
    if err := sendJSONRequest(retryCtx, path, req, resp); err != nil {
      // Since the contexts are nested, the order of testing here matters.
      // We have to check the parent context first - if it's done, it means
      // we have to return.
      if contextDone(ctx) {
        c.clientlog("parent context done; bailing out")
        retryCtxCancel()
        return err
      } else if contextDeadlineExceeded(retryCtx) {
        // If the parent context is not done, but our retry context is done,
        // it's time to retry a different service.
        c.clientlog("timed out: will try next address")
        c.assumedLeader = (c.assumedLeader + 1) % len(c.addrs)
        retryCtxCancel()
        continue FindLeader
      }
      retryCtxCancel()
      return err
    }
    c.clientlog("received response %#v", resp)

    // No context/timeout on this request - we've actually received a response.
    switch resp.Status() {
    case api.StatusNotLeader:
      c.clientlog("not leader: will try next address")
      c.assumedLeader = (c.assumedLeader + 1) % len(c.addrs)
      retryCtxCancel()
      continue FindLeader
    case api.StatusOK:
      retryCtxCancel()
      return nil
    case api.StatusFailedCommit:
      retryCtxCancel()
      return fmt.Errorf("commit failed; please retry")
    default:
      panic("unreachable")
    }
  }
}

There's some context subtlety going on here - hopefully the comments make that clear enough.

The client keeps track of the last service it saw that accepted a command as a leader. When asked to send a new command to the service, this is the service it starts from. If its request to the assumed leader times out, or that service says it's no longer the leader, the client retries to the next service in the cluster.

During normal operation, the leader will typically be stable, each client will quickly discover who it is and from that point on will address the leader directly. When there's a cluster disruption, the client will spend a bit of time looking for the leader - but this can be optimized if needed [4].

If a client can't find a leader, it will just keep trying; since we use the Go context idiom, this can always be controlled by the user - by imposing a timeout on client operations, or canceling them for other reasons.

Future work

The KV service presented in this post provides strong consistency guarantees, as discussed. However, keeping systems linearizable all the way through the client is notoriously tricky, and the simple client we presented in this post is not immune to issues.

The problem is with its retry logic; when a client sends a PUT command to a leader and the request times out, what is the right thing to do? Our client just retries, looking for a different leader. Is this the right approach?

Not necessarily! Consider what happens if the leader committed the command, but crashed before responding to the client. If the client now retries, the command may end up duplicated in the log. While it may seem like this shouldn't be a problem because PUT is idempotent [5], it can in fact cause non-linearizable behavior to be observed, if some other client managed to PUT another value for the same key in-between the replies.

This isn't a trivial problem; in fact, it's also mentioned in section 8 of the Raft paper. We'll spend the next part in the series discussing this problem in detail, presenting one potential solution and talking about how real-world distributed KV services deal with it.

[1]	For the terms used in this description, refer to Part 0.

[2]	This is exactly how our client implementation works, as we'll see soon.

[3]	The paper also discusses some ideas for optimizations of this process. Since this optimizes the uncommon path (when crashes and disconnections disrupt the normal operation of the Raft cluster), I leave this out of my implementation.

[4] Here's an exercise: the AppendEntries RPC sent by leaders to followers contains a "leader ID" field; so followers know who the current leader is. We already have it in our Raft implementation; try to plumb this information all the way through to the client. When a follower sends a "I'm not a leader" response to the client, it can include the ID of the service it thinks is the current leader; this can reduce the search time somewhat.

[5]	Applying `PUT(k1, v1)` right after another `PUT(k1,v1)` doesn't affect the correctness of the DB.

Linearizability in distributed systems

2024-10-07T19:16:00-07:00

Linearizability is a strong consistency model in concurrent and distributed systems. From the paper introducing it [1]:

Linearizability provides the illusion that each operation applied by concurrent processes takes effect instantaneously at some point between its invocation and its response.

On first reading (and probably on the second and third...) this sounds a bit abstract, but it really is all there is to it. A slightly different way to think about it is - a linearizable system appears as if there's only one copy of data in existence, and all client operations apply to this data atomically. This post dives deeper into what this means in practice.

Registers

Linearizability is a single-object consistency model (see the "Linearizability vs. Serializability" section below for more on this). It's common in distributed systems literature to talk about a register - a single key-value pair, for example, stored in some distributed database. When clients write and read this register concurrently, we can analyze the history of operations and their results and determine if the system maintains linearizability.

Basic example

The following diagram describes a sequence of register reads and writes by three different clients; some of these operations are done concurrently. Time flows from left to right, and a colored rectangle denotes an operation; its left edge is the operation's start, and its right edge the operation's completion [2].

Here are the events, each with its own number in the yellow bubble:

Client A reads the register and gets the value of 0. The read itself happened at some point in time in the database, denoted on the timeline in the very bottom of the diagram.
Client B reads the value 0. Note that this read operation is partially concurrent with the write operation (3); concurrent operations can execute in any order, but here (2) happened to be executed before (3) (we know this because the value 0 was read, not 1).
Client C writes 1 into the register.
Client A reads 1 from the register. This read is also concurrent with the write, and thus could end up with any result, but since the result in this timeline is 1, we know it happened after (3).
Client B reads 1 from the register.

This sequence of events is valid in a linearizable system, because we can construct a serial history of events (the bottom timeline) that's consistent with our results. Each event occurs instantaneously at some point between the start and finish of the client request.

Compare this to the following sequence, which is not valid:

This sequence is similar to the first one, with one difference: B's read in (5) results in 0. Since (5) is concurrent with (3), when seen in isolation this isn't unreasonable. However, since in (4), client A already observed the value 1 in the register and (4) happens before (5), this sequence is invalid in a linearizable system. We can imagine systems with weaker consistency guarantees producing this history, but such systems are not linearizable.

Another way to look at it is examine the timeline in the bottom of the diagram. Notice that (5) reads 0, after (3) happened. We just can't find a way to arrange this history so it looks sequential - therefore, it's inconsistent with linearizability.

A more subtle example

Here's a more subtle example, taken from the linearizability paper:

This sequence of events is invalid for a linearizable system! To understand why, let's follow the timeline at the bottom of the diagram.

(3) Client B's write of 1 executes before (2) client A's read, because A reads 1 from the register. If the read at (2) happened before the write at (3), it (the read) would result in 0, not 1.

Event (4) has to happen after event (2), since it starts after (2) ends. But we've just reasoned that (2) happens after (3); therefore, (4) happens after (3) - even though these two writes are concurrent, their order is imposed by observing other events.

Finally, since we've just proven that (4) happens after (3), the value in the register at the conclusion of (3) is 0, not 1; therefore, the read of 1 in (5) is invalid. This system cannot be linearizable. As before, you can try to arrange the events in the bottom of the diagram into some sequential order - this attempt will fail, because no consistent sequential order can account for the observed events.

A formal definition

I personally found the formal definition of linearizability in the Herlihy & Wang paper somewhat obscured by attention given to potentially unfinished operations. If we assume that every operation has a start and an end, it's easier to restate the formal definition as follows.

An operation e has the timestamps start(e) and end(e); these are the left and right boundaries of the rectangles in the diagrams shown above.

A history H exists with a strong partial order <_H on operations [3]: e_0 <_H e_1 if end(e_0) precedes start(e_1) in H. Operations unrelated by <_H are said to be concurrent in H. In our diagrams, H represents the observed history (the part of the diagram with the overlapping rectangles). The formal definition captures what it means for us to know that some operations precede others, while other operations are concurrent.

For example, in our last diagram above if H is the history shown, then e_3 <_H e_5, but the pair e_3,e_4 is not in the relation <_H, since these operations are concurrent.

If H is a sequential history, then <_H is a total order. It means there are no concurrent operations.

Now it's time for the definition of linearizability. H is linearizable if:

H is equivalent to some sequential history S
<_H \subseteq <_S

The second item requires a bit of elaboration: recall that <_H and <_S are relations. <_H being a subset of <_S means that the partial order of operations in the real-time history H is preserved in the linearization.

We then call S the linearization of H. In our diagrams, S is the bottom line where operations are shown on the server happening immediately; they are still represented by start(e) and end(e) in the history (we can just assume start(e) and end(e) are infinitesimally close in time, since the DB applies operations atomically).

Linearizability vs. Serializability

Linearizability is often confused with serializability - another consistency model. The two are fundamentally different, though:

Serializability is a multi-object property useful to describe transactions that consist of multiple operations that may potentially touch multiple objects; informally, it means that transactions happen atomically, and their sub-operations cannot be observed in isolation or intermix.
Linearizability is a single-object property, talking about the observed effects on a single register, as this post demonstrates.

For a great taxonomy of consistency models, see this page from Jepsen.

Additional resources

Kyle Kingsbury - on his blog and through his company Jepsen - has a wealth of great resources on the subject of linearizability and other consistency models. Some examples:

The taxonomy, as mentioned above, and the related blog post
Knossos, a linearizability checker: blog post and project page
Jepsen's analysis of etcd has an interesting practical discussion of linearizability in a real-world system

[1]	Herlihy, Maurice P.; Wing, Jeannette M. (1990). "Linearizability: A Correctness Condition for Concurrent Objects". ACM Transactions on Programming Languages and Systems.

[2]	The operation itself happens instantaneously on the server at some moment within the rectangle's boundaries, but we don't know exactly when due to network delays.

[3]	For a refresher on the math used here (relations, orders) see this post.

Summary of reading: July - September 2024

2024-09-30T17:01:00-07:00

"A City on Mars" By Kelly and Zach Weinersmith - actual discussion of building settlements on Mars occupies maybe 1% of this book. The authors have an uncanny talent of focusing on all the least interesting aspects of space exploration; whatever little is dedicated to science and engineering is fairly shallow and reeks of techno-pessimism. That said, if the aspects of space exploration that interest you most are politics, social structures, legal frameworks, labor relations and mental health - go ahead and read this book. Overall, I found it infuriatingly bad.
"What You Are Looking For Is in the Library" Michiko Aoyama - a sweet little collection of loosely-related short stories. Just regular people living their lives in Japan; the common theme is changing circumstances and how to deal with them. I really enjoy books of this kind, and this one is very well done.
"Is this Wi-Fi Organic?: A guide to spotting misleading science online" by Dave Farina - a valiant attempt to build up a solid scientific foundation for debunking myths (focusing mostly on alternative medicine). Unfortunately, the execution falls short. The tactic employed by the author is: explain a bunch of science at a pretty shallow level, then present some claim to debunk and then immediately jump to "this is clearly false, becasue science". Rinse; repeat. While entertaining, this isn't very convincing. From a book, I'd expect much more. For example, Simon Singh's "Trick or Treatment" is much better in this respect, because the debunking done with a lot of supporting evidence, citing relevant research and studies. The good thing I can say about this book is that I appreciate the techno-optimism and the science-first approach.
"An imaginary tale" by Paul J. Nahin - tells the history of the discovery and initial applications of the imaginary unit - i. Very interesting book that fills an under-served niche between popular science and textbooks. The book isn't easy to go through - it requires sophisticated math, at least at the engineering undergrad level. Except the last chapter - which goes a bit off the rails with complex analysis - this background should be sufficient for the vast majority of the book, but some work will still be required. While I didn't follow through every single calculation, I really enjoyed the book overall and should try to read additional stuff by this author.
"The Code Breaker" by Walter Isaacson - a biography of Jennifer Doudna, focusing on the discovery of CRISPR, the competition around it, the ethical implications of gene editing and COVID vaccine research. An OK book overall, with some really tedious parts; not the best Isaacson, in my experience.
"An Immense World" by Ed Yong - secondary title "How Animal Senses Reveal the Hidden Realms Around Us"; an information-packed book with a huge scope, describing the sensing capabilities animals possess beyond the human range. Very interesting.
"Mansfield Park" by Jane Austen - familiar style and topic - the lives of the bored 19th century English gentry. I liked this book less than others by Austen I read; the characters aren't as well developed (I feel like there are too many? Austen seems to have neglected to provide distinct roles for several), and the first half of the book is fairly dull.
"How to Avoid a Climate Disaster" by Bill Gates - a decent overview of the current state of global warming, and what it would take to avoid the worst-case scenario. This book could be much better, IMHO, if it had more depth and a coherent summary/plan. I'm also surprised by the relatively shallow coverage of nuclear (fission) power, given the importance attributed to it by Gates. Also, the coverage of carbon capture is surprisingly minimal. On the good side, I really liked the concept of "green premium" and how to factor that into economically realistic solutions. This is one of those subjects that's moving so fast, however, that it needs a fresh treatment every few years. This book is from 2020 and already some information feels stale due to the huge progress in solar deployments that's been made in the past few years.
"On Writing: A Memoir of the Craft" by Stephen King - a mix of autobiography with some advice on writing creative fiction. Very good overall. King's relationship with his family is inspiring and endearing - not the usual celebrity fare. One small gripe is that the book is artificially inflated with some tangential interviews at the end, so it's actually shorter than advertised.
"Engineering in Plain Sight" by Grady Hillhouse - the author is well-known from his YouTube channel "Practical Engineering", and this book is a text presentation of many of the topics he discusses in his videos. The writing is unmistakable in Hillhouse's enthusiastic style and the illustrations are beautiful. In all, a very nice book.
"Children of Time" by Adrian Tchaikovsky - first part of a sci-fi trilogy about space-faring humans and... spiders. Highly imaginative and fun to read.
"Journeys North: The Pacific Crest Trail" by Barney Scout Mann - a memoir of a person thru-hiking the full PCT in 2007, and some stories about his fellow hikers. Great book.
"The Final Frontiersman" by James Campbell - subtitle is "Heimo Korth and His Family, Alone in Alaska's Arctic Wilderness". Very nice biography about a fur trapper family living in a remote self-built cabin in north-east Alaska. Covers a period roughly from the 1980s to the early 2000s.

Re-reads:

"Naked Economics" by Charles Wheelan
"The count of Monte Cristo" by Alexandre Dumas
"Skunk Works: A Personal Memoir of My Years of Lockheed" by Ben Rich

Notes on running Go in the browser with WebAssembly

2024-09-14T06:05:00-07:00

Recently I've had to compile Go to WebAssembly to run in the browser in a couple of small projects (#1, #2), and in general spent some time looking at WebAssembly. I find WebAssembly to be an exciting technology, both for the web and for other uses (e.g. with WASI); specifically, it's pretty great that we can take existing projects and components written in Go and run them in the browser.

In this post, I will summarize some useful patterns in running Go in the browser via WebAssembly. All the patterns are demonstrated by small, self-contained programs you can find in this GitHub repository.

Basics: calling Go from JS

This sample serves as the basis for other samples in this post: let's write a Go function that we'll call in the browser using JS. This function uses Go's math/big stdlib package to calculate the sum of the harmonic series for some duration [1], and returns the result with high precision:

// calcHarmonic calculates the harmonic series for approximately the given
// number of seconds and returns the accumulated result in a string.
func calcHarmonic(nsecs float64) string {
  d := time.Duration(nsecs * float64(time.Second))
  start := time.Now()
  r1 := big.NewRat(1, 1)
  for i := 2; ; i++ {
    addend := big.NewRat(1, int64(i))
    r1 = r1.Add(r1, addend)

    if i%10 == 0 && time.Now().Sub(start) >= d {
      break
    }
  }
  return r1.FloatString(40)
}

To export this function to JS in the browser, we add the following code:

func main() {
  // Export the name "calcHarmonic" to JS, with our wrapper as value
  js.Global().Set("calcHarmonic", jsCalcHarmonic)

  // The Go main function compiled to WASM is expected to block
  // indefinitely.
  select {}
}

// wrap calcHarmonic to be callable from JS
var jsCalcHarmonic = js.FuncOf(func(this js.Value, args []js.Value) any {
  if len(args) != 1 {
    panic("want one argument")
  }

  s := calcHarmonic(args[0].Float())
  return js.ValueOf(s)
})

This Go file is compiled to the WASM/js target with:

GOOS=js GOARCH=wasm go build -o harmonic.wasm harmonic.go

And load it from JS:

// Instantiate a new Go object (defined in from wasm_exec.js)
const go = new Go();
WebAssembly.instantiateStreaming(fetch("harmonic.wasm"), go.importObject).then(
    (result) => {
        go.run(result.instance);
    });

The JS code that calls calcHarmonic is:

let buttonElement = document.getElementById("submitButton");
document.getElementById("submitButton").addEventListener("click", () => {
    let input = document.getElementById("timeInput").value;
    let s = calcHarmonic(parseFloat(input));
    document.getElementById("outputDiv").innerText = s;
});

Finally, the wasm_exec.js file from the Go distribution has to be included with something like:

<script src="wasm_exec.js"></script>

The easiest way to obtain this file is download it from the Go project's GitHub mirror (for the same Go version your Go code is compiled with); this is handled by the Makefile in our sample project:

wasm_exec.js:
  wget https://raw.githubusercontent.com/golang/go/release-branch.go1.22/misc/wasm/wasm_exec.js

This is the basic recipe for invoking Go from JS in the browser: the Go code is platform-agnostic and presents some API and all the glue logic is done in JS. The next samples show some variations on this basic scheme.

Link to the full code for this sample.

DOM manipulation from Go

In the previous example, Go implemented the calcHarmonic function, but the rest of the program's logic was in JS - setting up an event listener for a button click, updating output, etc.

We can move more of the code to Go, if we want. The calcHarmonic remains unchanged, but our main function in Go becomes:

func main() {
  doc := js.Global().Get("document")
  buttonElement := doc.Call("getElementById", "submitButton")
  inputElement := doc.Call("getElementById", "timeInput")
  outputElement := doc.Call("getElementById", "outputDiv")

  buttonElement.Call("addEventListener", "click", js.FuncOf(
    func(this js.Value, args []js.Value) any {
      input := inputElement.Get("value")
      inputFloat, err := strconv.ParseFloat(input.String(), 64)
      if err != nil {
        log.Println(err)
        return nil
      }
      s := calcHarmonic(inputFloat)
      outputElement.Set("innerText", s)
      return nil
    }))

  select {}
}

We obtain JS values from the js.Global() context and can call functions or set attributes on them. If you squint, this looks very similar to JS code, but written in Go-ish.

This code sample demonstrates some useful capabilities of DOM manipulation in Go:

Adding event listeners on DOM elements, with Go callbacks
Getting values from DOM elements
Setting attributes on DOM elements

The only code JS remaining in our index.html is the WebAssembly loader:

const go = new Go();
WebAssembly.instantiateStreaming(fetch("harmonic.wasm"), go.importObject).then(
    (result) => {
        go.run(result.instance);
    });

All the rest is done in Go! Link to the full code for this sample.

For a more full-featured sample, check out this directory. It implements a simple Game of Life running in the browser, entirely in Go. All the game logic, canvas manipulation and event management is done in Go; here too, the only JS code in the project is the few lines used to load the WebAssembly module.

I personally prefer keeping the UI logic in JS, but if you're interested in Go purity all the way - it's definitely feasible.

Using TinyGo as an alternative compiler

The Go compiler's support for WebAssembly is pretty good these days, but there's a small snag that may be important to users: the entire Go runtime is compiled into the WASM binary. On my machine, the .wasm files produced for the sample Go code weigh in at around 2.5 MiB, which will take some time to load in the browser - especially on slow connections [2].

There's an alternative: TinyGo is a Go toolchain "for small places", specializing in embedded controllers; the same considerations apply to WASM. The TinyGo runtime is lightweight compared to Go, and the binaries are about 1/4 the size. Not everything is perfect with TinyGo, though: compilation is much slower, and the resulting code is a bit slower as well. Finally, TinyGo has some limitations that make stdlib packages that rely on reflection not work; this can be painful when interacting with JS because encoding/json relies on reflection - so you may need to look for an alternative JSON package.

The dom-in-go sample directory also shows how to build the project with TinyGo; take a look at the Makefile. Note that TinyGo has its own wasm_exec.js support file - it won't work with the one taken from the standard Go distribution; the Makefile handles this too.

Keeping the main thread free: WebAssembly in a web worker

If we come back to the original sample and run the calculation for some non-trivial amount of time (say, 2 seconds or more) - you may notice something: the page appears "frozen" while the calculation is running. You can't interact with the UI in any way, can't select text with the mouse; if you try to add periodic console.log printouts or some spinner animation - nothing will show until calcHarmonic returns with the result.

This is the expected behavior for JS when it calls a blocking, CPU-intensive function! Let's revisit the code again:

 let buttonElement = document.getElementById("submitButton");
 document.getElementById("submitButton").addEventListener("click", () => {
     let input = document.getElementById("timeInput").value;
     let s = calcHarmonic(parseFloat(input));
     document.getElementById("outputDiv").innerText = s;
 });

The highlighted line will block the main thread for 2+ seconds, but the main thread in JS is also used for all the UI interaction. This is one of the most common manifestations of function coloring problem - blocking is problematic. Luckily, all modern browsers support Web Workers - isolated threads that can execute concurrently.

It's not hard to make web workers work with WebAssembly, which is what our next demo shows. The main HTML file includes, in addition to the UI logic:

const worker = new Worker("worker.js");
worker.onmessage = ({ data }) => {
    let { action, payload } = data;
    switch (action) {
        case "log":
            console.log(`worker.log: ${payload}`);
            break;
        case "result":
            resultReady(payload);
            break;
        default:
            console.error(`Unknown action: ${action}`);
    }
};

Where worker.js is:

importScripts("wasm_exec.js");
console.log("Worker is running");

// Load the WASM module with Go code.
const go = new Go();
WebAssembly.instantiateStreaming(fetch("harmonic.wasm"), go.importObject).then(
    (result) => {
        go.run(result.instance);
        console.log("Worker loaded WASM module");
    }).catch((err) => {
        console.error("Worker failed to load WASM module: ", err)
    });

onmessage = ({ data }) => {
    let { action, payload } = data;
    postMessage({
        action: "log",
        payload: `Worker received message ${action}: ${payload}`,
    });
    switch (action) {
        case "calculate":
            let result = calcHarmonic(payload);
            postMessage({ action: "result", payload: result });
            break;
        default:
            throw (`unknown action '${action}'`);
    }
};

(The Go code remains unchanged.)

We see that the worker does the WebAssembly loading now, meaning that the Go code executes in a separate thread and the UI thread is free to run while the computation is ongoing. This sample adds a spinner that animates until the web worker returns calcHarmonic's answer, to show the effect.

Link to the full code for this sample.

Talking on a Web Socket with Go

A few years ago I published a sample of a Go server talking via web sockets with JavaScript client code. Well, since the theme here is porting all client code to Go, how about we replace that JavaScript client with yet more Go?

This turns out to be fairly simple - not much different from the "DOM manipulation in Go" section, in fact. But there are some nuances I want to cover.

The application is simple - we display a box, and whenever there's mouse movement over the box, the client sends messages to the server via a web socket; the server echoes the message back and the client uses it to update a text div:

The server code is standard Go using the golang.org/x/net/websocket package. On the client, however, we have to use browser APIs. Here's the interesting part of the code:

const wsServerAddress = "ws://127.0.0.1:4050"

// These are equivalent to the following in JS:
//
//   ws = new WebSocket(addr) ...
//
wsCtor := js.Global().Get("WebSocket")
wsEcho := wsCtor.New(wsServerAddress + "/wsecho")
wsTime := wsCtor.New(wsServerAddress + "/wstime")

To send on a web socket, we'll use this function:

// wsSend sends a message on a web socket; the web socket must be active and
// open (otherwise wsSends logs an error and doesn't send anything).
// The message will be serialized to JSON prior to sending.
func wsSend(sock js.Value, msg any) {
  if !sock.IsNull() || sock.Get("readyState").Equal(js.Global().Get("WebSocket").Get("OPEN")) {
    b, err := json.Marshal(msg)
    if err != nil {
      log.Fatal(err)
    }
    sock.Call("send", string(b))
  } else {
    log.Println("socket is not open")
  }
}

And here's how receiving looks, registering the message event listener:

wsEcho.Call("addEventListener", "message", js.FuncOf(
  func(this js.Value, args []js.Value) any {
    event := args[0]
    var ev Event
    if err := json.Unmarshal([]byte(event.Get("data").String()), &ev); err != nil {
      log.Fatal(err)
    }
    coordMsg := fmt.Sprintf("Coordinates: (%v, %v)", ev.X, ev.Y)
    outputElement.Set("innerText", coordMsg)
    return nil
  }))

As before, this is just straightforward translation of JS into Go [3]. Note something interesting that's going on here: we have two different Go programs, talking over web sockets with each other using completely different underlying libraries. One uses a Go-native implementation of web sockets; the other uses the browser implementation, exposed via a JS API. In a realistic program, it would make sense to abstract over these details so the same code could be used to send/receive data over web sockets, whether it runs on the server or the client.

Link to the full code for this sample.

Testing locally with Node.js

This section isn't strictly about "running in the browser", but it covers the important topic of local testing. Sometimes we don't want the browser in the loop for our tests; well, good news - we can leverage Node.js's ability to load and execute WebAssembly modules to run GOOS=js GOARCH=wasm Go binaries locally!

The intersting tidbit here is that we can leverage special support implemented in the Go toolchain to make these invocations similar to running/testing regular Go programs. Here's an excerpt from go help run describing it:

By default, 'go run' runs the compiled binary directly: 'a.out arguments...'.
If the -exec flag is given, 'go run' invokes the binary using xprog:
  'xprog a.out arguments...'.
If the -exec flag is not given, GOOS or GOARCH is different from the system
default, and a program named go_$GOOS_$GOARCH_exec can be found
on the current search path, 'go run' invokes the binary using that program,
for example 'go_js_wasm_exec a.out arguments...'. This allows execution of
cross-compiled programs when a simulator or other execution method is
available.

The Makefile in our sample handles this fully; we can run a test like this locally, without opening the browser:

//go:build js && wasm

package main

import (
  "log"
  "syscall/js"
  "testing"
)

func TestJSArr(t *testing.T) {
  log.Println("hello from test in js/wasm")

  objs := js.Global().Call("eval", `({
arr: [41,42,43],
})`)

  arr := objs.Get("arr")
  if got := arr.Length(); got != 3 {
    t.Errorf("got %#v, want %#v", got, 3)
  }

  if got := arr.Index(1).Int(); got != 42 {
    t.Errorf("got %#v, want %#v", got, 42)
  }
}

With an invocation like:

GOOS=js GOARCH=wasm go test -exec=supportfiles/go_js_wasm_exec -v .

Link to the full code for this sample.

[1]	The harmonic series is known to diverge, but very slowly. You need over 200 million elements to get to the sum of 20, etc. (see A004080).

[2]	There are some additional mitigations we can explore, like compressing the WASM binary. This is outside the scope of this post, and it applies to the TinyGo output as well.

[3]	To be honest, this makes me appreciate JS as an extension language. It has such a simple ABI! Everything is an object, and we can get/set object properties (which can be other objects), and call functions/methods - that's all we need to access all of the browser APIs.

Notes on the Euler formula

2024-09-07T05:44:00-07:00

The Euler formula states that for any real x:

\[e^{ix}=cos(x)+i sin(x)\]

Where i is the imaginary unit. This formula is extremely important in many branches of mathematics and engineering, but at first glance it's puzzling. What does a complex exponent even mean, and how can it be related to the trigonometric functions?

Complex number representations

Complex numbers have two canonical representations:

Cartesian: x+iy
Polar: r\angle \theta

Trigonometric formulae can be used to convert between the two in a straightforward way:

Given x+iy, we can compute r=\sqrt{x^2+y^2} and \theta=tan^{-1}(\frac{y}{x})
Given r\angle \theta we can compute x=r cos\theta and y=r sin\theta

Some intuition for Euler's formula

Representing x and y as above, we have:

\[z=r(cos\theta+i sin\theta)\]

Now, let's take two complex numbers, multiply them together and use some basic trigonometric identities:

\[\begin{align*} z_1&=r_1 (cos\theta+i sin\theta) \\ z_2&=r_2 (cos\phi+i sin\phi)\\ z_1 z_2 &= r_1 r_2 (cos\theta+i sin\theta)(cos\phi+i sin\phi)\\ &=r_1 r_2 ((cos\theta cos\phi -sin\theta sin\phi) +i(cos\theta sin\phi + cos\phi sin\theta))\\ &=r_1 r_2 (cos(\theta+\phi) + i sin(\theta+\phi))\\ &= r_1 r_2 \angle (\theta+\phi) \end{align*}\]

When multiplying two complex numbers in polar form, their magnitudes multiply but their angles add [1].

Now, suppose we have a hypothetical function f(\theta) such that r f(\theta) represents a complex number. We've just shown that:

\[(r_1 f(\theta))\cdot(r_2 f(\phi))=r_1 r_2 f(\theta+\phi)\]

What real-life function do you know that behaves like this? An exponential function! a^x a^y=a^{x+y} for some a. This isn't a proof of anything, of course, and the base of the exponent can be arbitrary - but it does show that there's something about complex numbers that behaves like exponentials.

Let's look at it from another direction. Once again starting with:

\[z=r(cos\theta+i sin\theta)\]

We'll treat z as a function of , and find its derivative:

\[\frac{dz}{d\theta}=-rsin\theta +i cos\theta\]

If we factor i out of the parenthesis, we get:

\[\frac{dz}{d\theta}=ri(cos\theta+i sin\theta)=iz\]

Note that this is exactly the derivative if z(\theta)=e^{i\theta}, another clue that complex numbers behave like exponential functions.

Proof using power series

The canonical proof of Euler's formula uses Maclaurin series expansions for e^x, cos(x) and sin(x).

As a reminder, the Maclaurin series approximation for a function is:

\[p(x) = f(0)+\frac{f'(0)}{1!}x+\frac{f''(0)}{2!}x^2+\frac{f'''(0)}{3!}x^3+\cdots=\sum_{n=0}^{\infty} \frac{f^{(n)}(0)}{n!}x^n\]

For e^x:

\[e^{x}=1+x+\frac{x^2}{2!}+\frac{x^3}{3!}+\frac{x^4}{4!}+\frac{x^5}{5!}+\cdots\]

Substituting ix for x and applying powers of i:

\[\begin{align*} e^{ix}&=1+ix+\frac{(ix)^2}{2!}+\frac{(ix)^3}{3!}+\frac{(ix)^4}{4!}+\frac{(ix)^5}{5!}+\cdots\\ &=1+ix-\frac{x^2}{2!}-\frac{ix^3}{3!}+\frac{x^4}{4!}+\frac{ix^5}{5!}+\cdots \end{align*}\]

Now let's regroup the real and imaginary parts of the series:

\[e^{ix}=\left(1-\frac{x^2}{2!}+\frac{x^4}{4!}+\cdots\right) +i\left(x-\frac{x^3}{3!}+\frac{x^5}{5!}+\cdots\right)\]

The contents of the first parenthesis is precisely the Maclaurin series expansion of cos(x), and the contents of the second parenthesis is the expansion of sin(x); therefore, we've just proven the Euler formula [2].

Proof using derivatives

Let's define \xi(x) as follows:

\[\xi(x)=\frac{cos(x)+isin(x)}{e^{ix}}=e^{-ix}(cos(x)+isin(x))\]

And compute its derivative:

\[\begin{align*} \frac{d\xi(x)}{dx}&=e^{-ix}(-sin(x)+icos(x))-ie^{-ix}(cos(x)+isin(x))\\ &=e^{-ix}(-sin(x)+icos(x))-e^{-ix}(icos(x)-sin(x))=0 \end{align*}\]

Thus, it's a constant function; what is its value? We can find \xi(0) easily - it's 1. Therefore, \xi(x)=1 everywhere and thus its numerator and denominator are always equal \blacksquare.

Visualizing Euler's formula

It's interesting to plot e^{i\phi} to observe its behavior. In the general case, it's very difficult to visualize functions in the complex domain because both the input and output are two-dimensional; so we'd need a 4D plot. Luckily for us, we're usually interested in e^{i\phi} only for \phi\in\mathbb{R}, so we have 3 dimensions to deal with:

Input dimension: \phi
Output dimensions: Re(e^{i\phi}) and Im(e^{i\phi})

If we isolate the two 2D plots of f(\phi)=Re(e^{i\phi}) as a function of g(\phi)=Im(e^{i\phi}), we get:

This is the expected result from Euler's formula! The real part of the complex exponent is cos(\phi), while the imaginary part is sin(\phi).

Finally, let's plot the projection of the 3D plot onto the real+imaginary axes:

It hopefully comes as no surprise that we get the unit circle! This is another way to demonstrate the beautiful connection between the trigonometric functions and circles. If you imagine a point moving along the unit circle counter-clockwise, this point's Re value will be cos(\phi) where \phi is its angle from the Re axis, and its Im value will be sin(\phi).

Euler's identity

Euler's famous identity ties the "five fundamental constants of mathematics" together:

\[e^{i\pi}+1=0\]

This identity is trivial to derive from the Euler formula, because:

\[e^{i\pi}=cos(\pi)+i sin(\pi)=-1\]

De Moivre formula

Let's take the complex exponent e^{ix} and raise it to the n-th power, where n is an integer:

\[(e^{ix})^n=e^{inx}\]

We can replace the complex exponent by its trigonometric equivalent using Euler's formula on both sides:

\[(cos(x) +isin(x))^n=cos(nx)+isin(nx)\]

This is De Moivre's Formula, which is extremely useful in calculations involving complex numbers, and is a treasure trove of trigonometric identities.

An alternative formulation of the De Moivre formula uses fractional powers and is useful for finding the roots of complex numbers. However, we have to be careful here, because the complex root function (just like its real counterpart!) is multi-valued; it maps a single value in its domain to potentially multiple values in its range [3].

This formulation says:

\[e^{\frac{ix}{n}}=(cos(x) +isin(x))^{\frac{1}{n}}=cos(\frac{x+2\pi k}{n})+isin(\frac{x+2\pi k}{n})\]

For integer $$0\leq k< n$$. This is because if we raise this number back to the power of n, we'll get back the original e^{ix} for any of these k (both sine and cosine are periodic with a period of 2\pi).

[1]	In other words, multiplying by a complex number combines a scaling and rotation operations. Multiplying any z by r\angle \theta scales (multiplies) z's magnitude by r and rotates it (counter-clockwise) by angle .

[2]	I've seen places that treat this as the definition of what a complex exponential means, rather than a proof.

[3]	And thus isn't strictly a function at all, if we want to put our abstract algebra hat on.

SentencePiece BPE Tokenizer in Go

2024-08-23T10:35:00-07:00

Earlier this year I wrote a post about implementing BPE tokenization in Go, which made it possible to reproduce OpenAI's tokenizer.

Today I want to mention a new project I've been hacking on recently: go-sentencepiece - a pure Go implementation of the SentencePiece tokenizer that's used for Google AI's models like Gemma and Gemini. SentencePiece has a canonical C++ implementation and Python bindings (using SWIG). While it's not too hard to wrap the C++ code with cgo, in some cases a C compiler dependency isn't desirable, so a pure Go solution may be useful. This is what go-sentencepiece is for.

A disclaimer: while SentencePiece contains implementations for both BPE and Unigram tokenizers, go-sentencepiece only implements BPE because this is the one use in practice by models. Also, go-sentencepiece doesn't implement the training phase of the tokenizer, only encoding & decoding. For training, feel free to review my previous post.

There are a couple of ways in which SentencePiece works differently from OpenAI's variant of BPE:

The text is not pre-split by whitespace using a regexp; instead, whitespace is considered just another part of the input and has its own tokens. You can even see it in the screenshot above - it's marked by the "fat underscore" character (U+2581). While single-space runes are usually part of the next non-space token, multi-space tokens exist as distinct tokens.
Instead of being configured by just a vocabulary and a regexp, SentencePiece tokenizers have a whole protobuf for configuration, with many options. go-sentencepiece only supports the set of options used for Google AI's models, but more can be added easily.

The whitespace difference turns out to play a crucial role in performance. My original BPE implementation was fairly naive, using simple quadratic algorithms for encoding; this was OK, because these algorithms were working on one word at a time, so the N was very small.

This is no longer sufficient for SentencePiece, however, since the length of the full text is N. Therefore, the implementation adopts some more sophisticated algorithms from the C++ SentencePiece codebase; in particular:

To match a prefix of a long string from a set of candidates, we use a trie data structure. The prefixmatcher package implements this and may be generally interesting.
To figure out which pair of tokens to try merging next, we use a heap-based priority queue; this is implemented in the generic priorityqueue package.

While I didn't spend much time in micro-optimizing the implementation, these algorithmic improvements sped up the encoder by about 100x compared to a naive approach, and it's now so fast that I don't think it will ever be a bottleneck in reality.

Config and set up

As mentioned earlier, SentencePiece is configurable with a protobuf file. There are two parts to this: first is a .proto file defining the schema of the protobuf. This is vendored into my repository, copied from the C++ SentencePiece repository. The .pb.go file is also in the tree so you don't need to run the protobuf compiler unless the .proto changes.

The second part is the protobuf itself, which contains the tokenizer vocabulary and a bunch of configuration options. This can be downloaded from the official Gemma repository. go-sentencepiece should be able to load this file.

Online demo

As before, I've implemented an online demo of this tokenizer by compiling it into WebAssembly and adding some HTML+JS scaffolding around it. This is where the screenshot above is from.

You can play with it here: https://eliben.github.io/go-sentencepiece/ (the model protobuf is quite big though, so this page may take a few seconds to load if you have a slow connection).

Building static binaries with Go on Linux

2024-07-30T14:35:00-07:00

One of Go's advantages is being able to produce statically-linked binaries [1]. This doesn't mean that Go always produces such binaries by default, however; in some scenarios it requires extra work to make this happen. Specifics here are OS-dependent; here we focus on Unix systems.

Basics - hello world

This post goes over a series of experiments: we take simple programs and use go build to produce binaries on a Linux machine. We then examine whether the produced binary is statically or dynamically linked. The first example is a simple "hello, world":

package main

import "fmt"

func main() {
  fmt.Println("hello world")
}

After building it with go build, we get a binary. There are a few ways on Linux to determine whether a binary is statically or dynamically linked. One is the file tool:

$ file ./helloworld
helloworld: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=Flm7stIXKLPfvBhTgXmR/PPwdjFUEkc9NCSPRC7io/PofU_qoulSqJ0Ktvgx5g/eQXbAL15zCEIXOBSPZgY, with debug_info, not stripped

You can see it says "statically linked". Another way is to use ldd, which prints the shared object dependencies of a given binary:

$ ldd ./helloworld
  not a dynamic executable

Alternatively, we can also use the ubiquitous nm tool, asking it to list the undefined symbols in a binary (these are symbols the binary expects the dynamic linker to provide at run-time from shared objects):

$ nm -u ./helloworld
<empty output>

All of these tell us that a simple helloworld is a statically-linked binary. Throughout the post I'll mostly be using ldd (out of habit), but you can use any approach you like.

DNS and user groups

There are two pieces of functionality the Go standard library defers to the system's libc on Unix machines, when some conditions are met. When cgo is enabled (as it often - but not always - is on Unix machines), Go will call the C library for DNS lookups in the net package and for user and group ID lookups in the os/user package.

Let's observe this with an experiment:

package main

import (
  "fmt"
  "net"
)

func main() {
  fmt.Println(net.LookupHost("go.dev"))
}

If we build this program, we notice it's dynamically linked, expecting to load a libc shared object at run-time:

$ go build lookuphost.go
$ ldd ./lookuphost
  linux-vdso.so.1 (0x00007b50cb22a000)
  libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007b50cae00000)
  /lib64/ld-linux-x86-64.so.2 (0x00007b50cb22c000)

This is explained in the net package documentation in some detail. The Go standard library does have a pure Go implementation of this functionality (although it may lack some advanced features). We can ask the toolchain to use it in a couple of ways. First, we can set the netgo build tag:

$ go build -tags netgo lookuphost.go
$ ldd ./lookuphost
  not a dynamic executable

Second, we can disable cgo entirely with the CGO_ENABLED env var. This env var is usually on by default on Unix systems:

$ go env CGO_ENABLED
1

If we disable it explicitly for our build, we'll get a static binary again:

$ CGO_ENABLED=0 go build lookuphost.go
$ ldd ./lookuphost
  not a dynamic executable

Similarly, some of the functionality of the os/user package uses libc by default. Here's an example:

package main

import (
  "encoding/json"
  "log"
  "os"
  "os/user"
)

func main() {
  user, err := user.Lookup("bob")
  if err != nil {
    log.Fatal(err)
  }

  je := json.NewEncoder(os.Stdout)
  je.Encode(user)
}

This produces a dynamically-linked binary:

$ go build userlookup.go
$ ldd ./userlookup
  linux-vdso.so.1 (0x0000708301084000)
  libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000708300e00000)
  /lib64/ld-linux-x86-64.so.2 (0x0000708301086000)

As with net, we can ask the Go toolchain to use the pure Go implementation of this user lookup functionality. The build tag for this is osusergo:

$ go build -tags osusergo userlookup.go
$ ldd ./userlookup
  not a dynamic executable

Or, we can disable cgo:

$ CGO_ENABLED=0 go build userlookup.go
$ ldd ./userlookup
  not a dynamic executable

Linking C into our go binary

We've seen that the standard library has some functionality that may require dynamic linking by default, but this is relatively easy to override. What happens when we actually have C code as part of our Go program, though?

Go supports C extensions and FFI using cgo. For example:

package main

// #include <stdio.h>
// void helloworld() {
//   printf("hello, world from C\n");
// }
import "C"

func main() {
  C.helloworld()
}

A program built from this source will be dynamically linked, due to cgo:

$ go build cstdio.go
$ ldd ./cstdio
  linux-vdso.so.1 (0x00007bc6d68e3000)
  libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007bc6d6600000)
  /lib64/ld-linux-x86-64.so.2 (0x00007bc6d68e5000)

In our C code, printf is a call to libc; even if we don't explicitly call into the C runtime in our C code, cgo may do it in the scaffolding code it generates.

Note that cgo may be involved even if your project has no C code of its own; several dependencies may bring in cgo. Some popular packages - like the go-sqlite3 driver - depend on cgo, and importing them will impose a cgo requirement on a program.

Obviously, building with CGO_ENABLED=0 is no longer an option. So what's the recourse?

Linking a `libc` statically

To recap, once we have C code as part of our Go binary, it's going to be dynamically linked on Unix, because:

The C code calls into libc (the C runtime)
The libc typically used on Unix systems is glibc
The recommended way to link to glibc is dynamically (for various technical and license-related reasons that are outside the scope of this post)
Therefore, go build produces dynamically-linked Go binaries

To change this flow of events, we can interpose at step (2) - use a different libc implementation, one that's statically linked. Luckily, such an implementation exists and is well used and tested - musl.

To follow along, start by installing musl. The standard instructions using ./configure --prefix=<MUSLDIR> and make / make install work well. We'll use $MUSLDIR to refer to the directory where musl is installed. musl comes with a gcc wrapper that makes it easy to pass all the right flags. To re-build our cstdio example using musl, run:

$ CC=$MUSLDIR/bin/musl-gcc go build --ldflags '-linkmode external -extldflags "-static"' cstdio.go
$ ldd ./cstdio
  not a dynamic executable

The CC env var tells go build which C compiler to use for cgo; the linker flags instruct it to use an external linker for the final build (read this for the gory details) and then to perform a static link.

This approach works for more complex use cases as well! I won't paste the code here, but the sample repository accompanying this post has a file called use-sqlite.go; it uses the go-sqlite3 package. Try go build-ing it normally and observe the dynamically linked binary produced; next, try to build it with the flags shown above to use musl, and observe that the produced binary will be statically linked.

Another curious tidbit is that we now have another way to build a statically-linked lookuphost program - by linking it with musl:

$ CC=$MUSLDIR/bin/musl-gcc go build --ldflags '-linkmode external -extldflags "-static"' lookuphost.go
$ ldd ./lookuphost
  not a dynamic executable

Since we didn't provide -tags netgo and didn't disable cgo, the Go toolchain uses calls into libc to implement DNS lookup; however, since these calls end up in the statically-linked musl, the final binary is statically linked!

Using Zig as our C compiler

Another alternative emerged recently to achieve what we want: using the Zig toolchain. Zig is a new systems programming language, which uses a bundled toolchain approach similar to Go. Its toolchain bundles together a Zig compiler, C/C++ compiler, linker and libc for static linking. Therefore, Zig can actually be used to link Go binaries statically with C code!

Instead of installing musl, we could instead install Zig and use its x86_64-linux-musl target (adjust the architecture if needed). This is done by pointing to the zig binary as our CC= env var; assuming Zig is installed in $ZIGDIR:

$ CC="$ZIGDIR/zig cc -target x86_64-linux-musl" go build cstdio.go
$ CC="$ZIGDIR/zig cc -target x86_64-linux-musl" go build use-sqlite.go

These will produce statically-linked Go binaries; the zig driver takes care of setting the right linker flags automatically, so the command-line ends up being slightly simpler than invoking musl-gcc. Another advantage of Zig here is that enables cross-compilation of Go programs that include C code [2].

I did find some issues with this approach, however; for example, attempting to link the lookuphost.go sample fails with a slew of linker errors.

Summary

Making sure Go produces a statically-linked binary on Linux takes a little bit of effort, but works well overall.

There's a long standing accepted proposal about adding a -static flag to go build that would take care of setting up all the flags required for a static build. AFAICT, the proposal is just waiting for someone with enough grit and dedication to implement and test it in all the interesting scenarios.

Code

The code for all the experiments described in this post is available on GitHub.

[1]	A statically-linked binary doesn't have run-time dependencies on other libraries (typically in the form of shared objects), not even the C runtime library (`libc`). I wrote much more about this topic in the past.

[2]

Go is well-known for its cross-compilation capabilities, but it depends on the C toolchain to compile C code. Therefore, when cgo is involved, cross-compilation is challenging. Zig can help with this because its toolchain supports cross compilation for Zig and C! It does so by bundling LLVM with a bunch of targets linked in.

Notes on Taylor and Maclaurin series

2024-07-23T18:55:00-07:00

A Maclaurin series is a power series - a polynomial with carefully selected coefficients and an infinite number of terms - used to approximate arbitrary functions with some conditions (e.g. differentiability). The Maclaurin series does this for input values close to 0, and is a special case of the Taylor series which can be used to find a polynomial approximation around any value.

Intuition

Let's say we have a function and we want to approximate it with some other - polynomial - function p(x). To make sure that p(x) is as close as possible to , we'll create a function that has similar derivatives to .

We start with a constant polynomial, such that p(0)=f(0). This approximation is perfect at 0 itself, but not as much elsewhere.
We want p(x) to behave similarly to around 0, so we'll set the derivative of our approximation to be the same as the derivative of at 0; in other words p'(0)=f'(0). This approximation will be decent very close to 0 (at least in the direction of the slope), but will become progressively worse as we get farther away from 0.
We continue this process, by setting the second derivative to be p''(0)=f''(0), the third derivative to be p'''(0)=f'''(0) and so on, for as many terms as we need to achieve a good approximation in our desired range. Intuitively, if many derivatives of p(x) are identical to the corresponding derivatives of at some point, the two functions will have very similar behaviors around that point [1].

The full Maclaurin series that accomplishes this approximation is:

\[p(x) = f(0)+\frac{f'(0)}{1!}x+\frac{f''(0)}{2!}x^2+\frac{f'''(0)}{3!}x^3+\cdots=\sum_{n=0}^{\infty} \frac{f^{(n)}(0)}{n!}x^n\]

We'll get to how this equation is found in a moment, but first an example that demonstrates its approximation capabilities. Suppose we want to find a polynomial approximation for f(x)=cos(x). Following the definition of the Maclaurin series, it's easy to calculate:

\[p_{cos}(x)=1-\frac{x^2}{2!}+\frac{x^4}{4!}-\frac{x^6}{6!}+\frac{x^8}{8!}-\cdots\]

(try it as an exercise).

The dark blue line is the cosine function f(x)=cos(x). The light blue lines are successive approximations, with k terms of the power series p_{cos}(x) included:

With k=1, p_{cos}(x)=1 since that's just the value of cos(x) at 0.
With k=2, p_{cos}(x)=1-\frac{x^2}{2}, and indeed the line looks parabolic
With k=3 we get a 4th degree polynomial which tracks the function better, and so on

With more terms in the power series, the approximation resembles cos(x) more and more, at least close to 0. The farther away we get from 0, the more terms we need for a good approximation [2].

How the Maclaurin series works

This section shows how one arrives at the formula for the Maclaurin series, and connects it to the intuition of equating derivatives.

We'll start by observing that the Maclaurin series is developed around 0 for a good reason. The generalized form of a power series is:

\[p(x)=a_0+a_1 x+a_2 x^2 + a_3 x^3 + a_4 x^4 + \cdots\]

To properly approximate a function, we need this series to converge; therefore, it would be desirable for its terms to decrease. An x value close to zero guarantees that x^n becomes smaller and smaller with each successive term. There's a whole section on convergence further down with more details.

Recall from the Intuition section that we're looking for a polynomial that passes through the same point as at 0, and that has derivatives equal to those of at that point.

Let's calculate a few of the first derivatives of p(x); the function itself can be considered as the 0-th derivative:

\[\begin{align*} p(x)&=a_0+a_1 x+a_2 x^2 + a_3 x^3+ a_4 x^4+\cdots\\ p'(x)&= a_1 +2 a_2 x + 3 a_3 x^2+4 a_4 x^3+\cdots\\ p''(x)&= 2 a_2 + 3 \cdot 2 a_3 x+ 4 \cdot 3 x^2+\cdots\\ p'''(x)&= 3\cdot 2 a_3 + 4\cdot 3 \cdot 2 x+\cdots \\ \cdots \end{align*}\]

Now, equate these to corresponding derivatives of at x=0. All the non-constant terms drop out, and we're left with:

\[\begin{align*} f(0)&=p(0)=a_0\\ f'(0)&=p'(0)= a_1 \\ f''(0)&=p''(0)= 2 a_2 \\ f'''(0)&=p'''(0)= 3\cdot 2 a_3 \\ \cdots\\ f^{(n)}(0)&=p^{(0)}(0)=n!a_n\\ \cdots\\ \end{align*}\]

So we can set the coefficients of the power series, generalizing the denominators using factorials:

\[\begin{align*} a_0 &= f(0)\\ a_1 &= \frac{f'(0)}{1!}\\ a_2 &= \frac{f''(0)}{2!}\\ a_3 &= \frac{f'''(0)}{3!}\\ \cdots \\ a_n &= \frac{f^{(n)}(0)}{n!} \end{align*}\]

Which gives us the definition of the Maclaurin series:

\[p(x) = f(0)+\frac{f'(0)}{1!}x+\frac{f''(0)}{2!}x^2+\frac{f'''(0)}{3!}x^3+\cdots=\sum_{n=0}^{\infty} \frac{f^{(n)}(0)}{n!}x^n\]

Taylor series

The Maclaurin series is suitable for finding approximations for functions around 0; what if we want to approximate a function around a different value? First, let's see why we would even want that. A couple of major reasons come to mind:

We have a non-cyclic function and we're really interested in approximating it around some specific value of x; if we use Maclaurin series, we get a good approximation around 0, but its quality is diminishing the farther away we get. We may be able to use much fewer terms for a good approximation if we start it around our target value.
The function we're approximating is not well behaved around 0.

It's the second reason which is most common, at least in calculus. By "not well behaved" I mean a function that's not finite at 0 (or close to it), or that isn't differentiable at that point, or whose derivatives aren't finite.

There's a very simple and common example of such a function - the natural logarithm ln(x). This function is undefined at 0 (it approaches -\infty). Moreover, its derivatives are:

\[\begin{align*} ln'(x)&= \frac{1}{x}\\ ln''(x)&= -\frac{1}{x^2}\\ ln'''(x)&= \frac{2}{x^3}\\ ln^{(4)}(x)&= -\frac{6}{x^4}\\ ln^{(5)}(x)&= \frac{24}{x^5}\\ \cdots \end{align*}\]

None of these is defined at 0 either! The Maclaurin series won't work here, and we'll have to turn to its generalization - the Taylor series:

\[p(x) = f(a)+\frac{f'(a)}{1!}(x-a)+\frac{f''(a)}{2!}(x-a)^2+\frac{f'''(a)}{3!}(x-a)^3+\cdots=\sum_{n=0}^{\infty} \frac{f^{(n)}(a)}{n!}(x-a)^n\]

This is a power series that provides an approximation for around any point a where is finite and differentiable. It's easy to use exactly the same technique to develop this series as we did for Maclaurin.

Let's use this to approximate ln(x) around x=1, where this function is well behaved. ln(1)=0 and substituting x=1 into its derivatives (as listed above) at this point, we get:

\[f'(1)=1\quad f''(1)=-1\quad f'''(1)=2\quad f^{(4)}(1)=-6\quad f^{(5)}(1)=24\]

There's a pattern here: generally, the n-th derivative at 1 is (n-1)! with an alternating sign. Substituting into the Taylor series equation from above we get:

\[p_{ln}(x)=(x-1)-\frac{1}{2}(x-1)^2+\frac{1}{3}(x-1)^3-\frac{1}{4}(x-1)^4+\cdots\]

Here's a plot of approximations with the first k terms (the function itself is dark blue, as before):

While the approximation looks good in the vicinity of 1, it seems like all approximations diverge dramatically at some point. The next section helps understand what's going on.

Convergence of power series and the ratio test

When approximating a function with power series (e.g. with Maclaurin or Taylor series), a natural question to ask is: does the series actually converge to the function it's approximating, and what are the conditions on this convergence?

Now it's time to treat these questions a bit more rigorously. We'll be using the ratio test to check for convergence. Generally, for a series:

\[\sum_{n=1}^\infty a_n\]

We'll administer this test:

\[L = \lim_{n\to\infty}\left|\frac{a_{n+1}}{a_n}\right|\]

And check the conditions for which L < 1, meaning that our series converges absolutely.

Let's start with our Maclaurin series for cos(x):

\[p_{cos}(x)=1-\frac{x^2}{2!}+\frac{x^4}{4!}-\frac{x^6}{6!}+\frac{x^8}{8!}-\cdots=1+\sum_{n=1}^{\infty} \frac{(-1)^n x^{2n}}{(2n)!}\]

Ignoring the constant term, we'll write out the ratio limit. Note that because of the absolute value, we can ignore the power-of-minus-one term too:

\[\begin{align*} L &= \lim_{n\to\infty}\left|\frac{a_{n+1}}{a_n}\right|\\ &= \lim_{n\to\infty}\left| \frac{x^{2n+2} (2n)!}{(2n+2)! x^{2n}}\right|\\ &= \lim_{n\to\infty}\left| \frac{x^2}{(2n+1)(2n+2)}\right| \end{align*}\]

Since the limit contents are independent of x, it's obvious that that L=0 for any x. This means that the series converges to cos(x) at any x, given an infinite number of terms. This matches our intuition for this function, which is well-behaved (smooth everywhere).

Now on to ln(x) with its Taylor series around x=1. The series is:

\[p_{ln}(x)=(x-1)-\frac{1}{2}(x-1)^2+\frac{1}{3}(x-1)^3-\frac{1}{4}(x-1)^4+\cdots=\sum_{n=1}^{\infty} \frac{(-1)^{n+1} (x-1)^n}{n}\]

Once again, writing out the ratio limit:

\[\begin{align*} L &= \lim_{n\to\infty}\left|\frac{a_{n+1}}{a_n}\right|\\ &= \lim_{n\to\infty}\left| \frac{(x-1)^{n+1} n}{(n+1) (x-1)^n}\right|\\ &= \lim_{n\to\infty}\left| \frac{n(x-1)}{(n+1)}\right|\\ &= \left|x-1\right| \lim_{n\to\infty}\left| \frac{n}{(n+1)}\right|=\left| x-1\right| \end{align*}\]

To converge, we require:

\[L=\left| x-1\right|<1\]

The solution of this inequality is 0 < x < 2. Therefore, the series converges to ln(x) only in this range of x. This is also what we observe in the latest plot. Another way to say it: the radius of convergence of the series around x=1 is 1.

[1]	If this explanation and the plot of cos(x) following it don't convince you, consider watching this video by 3Blue1Brown - it includes more visualizations as well as a compelling alternative intuition using integrals and area.

[2]

Note that since cos(x) is cyclic, all we really need is good approximations in the range [-\pi, \pi). Our plot only shows the positive x axis; it looks like a mirror image on the negative side, so we see that a pretty good approximation is achieved by the time we reach k=5.

This is also a good place to note that while Maclaurin series are important in Calculus, it's not the best approximation for numerical analysis purposes; there are better approximations that converge faster.

Asking an LLM to build a simple web tool

2024-07-09T20:09:00-07:00

I've been really enjoying following Simon Willison's blog posts recently. Simon shows other programmers the way LLMs will be used for code assistance in the future, and posts full interactions with LLMs to build small tools or parts of larger applications.

A recent post caught my attention; here Simon got an LLM (Claude 3.5 Sonnet in this case) to build a complete tool that lets one configure/tweak box shadow settings and copy the resulting CSS code for use in a real application. One thing that seemed interesting is that the LLM in this case used some heavyweight dependencies (React + JSX) to implement this; Almost 3 MiB of dependency for something that clearly needs only a few dozen lines of HTML + JS to implement; yikes.

So I've decided to try my own experiment and get an LLM to do this without any dependencies. It turned out to be very easy, because the LLM I used (in this case ChatGPT 4o, but it could really have been any of the top-tier LLMs, I think) opted for the no-dependency approach from the start. I was preparing to ask it to adjust the code to remove dependencies, but this turned out to be unnecessary.

The resulting tool is very similar to Simon's in functionality; it's deployed at https://eliben.org/box-shadow-tool/; here's a screenshot:

Here are my prompts:

CSS for a slight box shadow, build me a tool that helps me twiddle settings and preview them and copy and paste out the CSS

ChatGPT produced a working tool but it didn't really look good on the page.

Yes, make the tool itself look a bit better with some CSS so it's all centered on the screen and there's enough space for the preview box

It still wasn't quite what I wanted.

the container has to be wider so all the text and sliders fix nicely, and there's still not enough space for the shadows of the preview box to show without overlapping with other elements

Now it was looking better; I wanted a button to copy-paste, like in Simon's demo:

this looks better; now add a nice-looking button at the bottom that copies the resulting css code to the clipboard

The code ChatGPT produced for the clipboard copy operation was flagged by vscode as deprecated, so I asked:

it seems like "document.execCommand('copy')" is deprecated; is there a more accepted way to do this?

The final version can be seen in the online demo (view-source). The complete ChatGPT transcript is available here.

Insights

Overall, this was a positive experience. While a tool like this is very simple to implement manually, doing it with an LLM was even quicker. The results are still not perfect in terms of alignment and space, but they're good enough. At this point one would probably just take over and do the final tweaks manually.

I was pleasantly surprised by how stable the LLM managed to keep its output throughout the interaction; it only modified the parts I asked it to, and the rest of the code remained identical. Stability has been an issue with LLMs (particularly for images), and I'm happy to see it holds well for code (there could be some special tuning or prompt engineering for ChatGPT to make this work well).

Eli Bendersky's website

Calculating the norm of a complex number

The norm of a complex number

Norm: a formal definition

Why z squared is not a norm-square

zz* is a norm-square

Conclusion

Appendix: zz* and the formal definition of norm

Implementing Raft: Part 4 - Key/Value Database

Key / value database as a state machine

System diagram

KV service architecture

Commands

Life of a PUT request to the service

KV service code walk-through

Consistency guarantees

Plumbing read-only operations through the Raft log

KV client

Future work

Linearizability in distributed systems

Registers

Basic example

A more subtle example

A formal definition

Linearizability vs. Serializability

Additional resources

Summary of reading: July - September 2024

Notes on running Go in the browser with WebAssembly

Basics: calling Go from JS

DOM manipulation from Go

Using TinyGo as an alternative compiler

Keeping the main thread free: WebAssembly in a web worker

Talking on a Web Socket with Go

Testing locally with Node.js

Notes on the Euler formula

Complex number representations

Some intuition for Euler's formula

Proof using power series

Proof using derivatives

Visualizing Euler's formula

Euler's identity

De Moivre formula

SentencePiece BPE Tokenizer in Go

Config and set up

Online demo

Building static binaries with Go on Linux

Basics - hello world

DNS and user groups

Linking C into our go binary

Linking a libc statically

Using Zig as our C compiler

Summary

Code

Notes on Taylor and Maclaurin series

Intuition

How the Maclaurin series works

Taylor series

Convergence of power series and the ratio test

Asking an LLM to build a simple web tool

Insights

Linking a `libc` statically