I ran into an interesting algorithm while hacking on Advent of Code a while ago. This post is a summary of what I've learned.
Consider a binary tree that represents nested pairs, where each pair consists of two kinds of elements: a number, or another pair. For example, the nested pair ((6 9) ((3 4) 2)) is represented with this tree [1]:
(ignore the numbered lines on the right for now, we'll get to them shortly)
Trees representing such pairs have the following characteristics:
- Leaves hold numbers, while internal nodes don't hold numbers, but only pointers to child nodes.
- Each node in the tree has either 0 or 2 children.
- A non-empty tree has at least one internal node.
While looking for alternative solutions to a problem, I ran across Tim Visée's Rust solution which uses an interesting representation of this tree. It's represented by an in-order traversal of the tree, with a list of (value depth) pairs where value is a leaf value and depth is its depth in the tree. The depth starts from 0 at the root - this is what the numbered lines in the diagram above represent.
For our sample tree, the inorder-depth representation is as follows:
(6 2) (9 2) (3 3) (4 3) (2 2)
The surprising realization (at least for me) is that the original tree can be reconstructed from this representation! Note that it's just a list of leaf values - the internal nodes are not specified. It's well known that we can't reconstruct a tree just from its in-order traversal, but a combination of the added depth markers and the restrictions on the tree make it possible.
I'll present a recursive algorithm to reconstruct the tree (based on Tim Visée's code, which does not explicitly rebuild the tree but computes something on it); this algorithm is very clever and isn't easy to grok. Then, I'll present an iterative algorithm which IMHO is easier to understand and explain.
But first, let's start with the data structures. The full (Go) code is available on GitHub.
type DItem struct {
Value int
Depth int
}
type DList []DItem
This is our representation of the inorder-depth list - a slice of DItem values, each of which has a numeric value and depth.
The tree itself is just what you'd expect in Go:
type Tree struct {
Value int
Left, Right *Tree
}
Recursive algorithm
Here is the recursive version of the tree reconstruction algorithm:
func (dl DList) BuildTreeRec() *Tree {
cursor := 0
var builder func(depth int) *Tree
builder = func(depth int) *Tree {
if cursor >= len(dl) {
return nil
}
var left *Tree
if dl[cursor].Depth == depth {
left = &Tree{Value: dl[cursor].Value}
cursor++
} else {
left = builder(depth + 1)
}
var right *Tree
if dl[cursor].Depth == depth {
right = &Tree{Value: dl[cursor].Value}
cursor++
} else {
right = builder(depth + 1)
}
return &Tree{Left: left, Right: right}
}
return builder(1)
}
I find this algorithm fairly tricky to understand; the combination of double recursion with mutable state is powerful. Some tips:
- cursor represents the next item in the inorder-depth list; it may help thinking of it as a queue; taking dl[cursor] and advancing cursor is akin to popping from the head of the queue.
- The depth parameter represents the depth in the tree the builder is currently on. If the next item in the queue has a matching depth, we construct a leaf from it. Otherwise, we recurse with higher depth to construct an internal node starting from it.
- The basic recursive invariant for builder is: the remaining items in dl represent a pair: build its left side, then build its right side.
If it's still not 100% clear, that's OK. In what follows, I'll describe an alternative formulation of this algorithm - without recursion. IMHO this version is easier to follow, and once one gets it - it's also easier to understand the recursive approach.
Iterative algorithm
To get some intuition for how the algorithm works, let's first work through the example we've using throughout this post. We'll take the inorder-depth representation:
(6 2) (9 2) (3 3) (4 3) (2 2)
And will see how to construct a tree from it, step by step. In what follows, the numbered list walks through inserting the first 6 child nodes into the tree, and the steps correspond one-to-one to the diagrams below the list. Each step of the algorithm inserts one node into the tree (either an internal node or a leaf node with the value). The red "pointer" in the diagrams corresponds to the node inserted by each step.
Let's assume we begin with the root node already created.
- To insert (6 2) we have to get to depth 2. The children of the root node would be at depth 1, so we have to create a new internal node first. Since the list is in-order, we create the left child first and move our pointer to it.
- Now our current node's children are depth 2, so we can insert (6 2). Since the current node has no left child, we insert 6 as its left child.
- The next node to insert is (9 2). The node we've just inserted is a leaf, so we backtrack to its parent. Its children are depth two, and it has no right child, so we insert 9 as its right child.
- The next node to insert is (3 3). The current node is a leaf so it can't have children; we climb up to the parent, which already has both its children links created. So we climb up again, to the root. The root has a left child, but no right child. We create the right child.
- Since the current node's children are depth 2, we can't insert (3 3) yet. The current node has no left child, so we create it and move into it.
- The current node's children are depth 3, so we can insert 3 as its left child.
And so on, until we proceed to insert all the values.
The main thing to notice here is that the insertion follows a strict in-order. We go left as far as possible, then backtrack through the parent and turn right. How much is "possible" is determined by the depth markers in the representation, so there's actually no ambiguity [2].
Before we move on to the code, one important point about reaching a parent from a given node. There are at least two common ways to do this: one is keeping parent links in the nodes, and another is using a stack of parents while constructing the tree. In the code shown below, I opt for the second option - an explicit stack of parent nodes. This code can be easily rewritten with parent links instead (try it as an exercise!)
With all that in place, the code shouldn't be hard to understand; here it is, with copious comments:
// BuildTree builds a Tree from a DList using an iterative algorithm.
func (dl DList) BuildTree() *Tree {
if len(dl) == 0 {
return nil
}
// result is the tree this function is building. The result pointer always
// points at the root, so we can return it to the caller. t points to the
// current node being constructed throughout the algorithm.
result := &Tree{}
t := result
// depth is the current depth of t's children.
depth := 1
// stack of parent nodes to implement backtracking up the tree once we're done
// with a subtree.
var stack []*Tree
// The outer loop iterates over all the items in a DList, inserting each one
// into the tree. Loop invariant: all items preceding this item in dl have
// already been inserted into the tree, and t points to the node where the
// last insertion was made.
nextItem:
for _, item := range dl {
// The inner loop find the right place for item in the tree and performs
// insertion.
// Loop invariant: t points at the node where we're trying to insert, depth
// is the depth of its children and stack holds a stack of t's parents.
for {
// Check if item can be inserted as a child of t; this can be done only if
// our depth matches the item's and t doesn't have both its children yet.
// Otherwise, t is not the right place and we have to keep looking.
if item.Depth == depth && t.Left == nil {
t.Left = &Tree{Value: item.Value}
continue nextItem
} else if item.Depth == depth && t.Right == nil {
t.Right = &Tree{Value: item.Value}
continue nextItem
}
// We can't insert at t.
// * If t does not have a left child yet, create it and repeat loop with
// this left child as t.
// * If t does not have a right child yet, create it and repeat loop with
// this right child as t.
// * If t has both children, we have to backtrack up the tree to t's
// parent.
if t.Left == nil {
stack = append(stack, t)
t.Left = &Tree{}
t = t.Left
depth++
} else if t.Right == nil {
stack = append(stack, t)
t.Right = &Tree{}
t = t.Right
depth++
} else {
// Pop from the stack to make t point to its parent
t, stack = stack[len(stack)-1], stack[:len(stack)-1]
depth--
}
}
}
return result
}
Final words
If you take some time to convince yourself that the iterative algorithm works, it becomes easier to understand the recursive one... because it's doing the exact same thing! The loops are replaced by recursion; the explicit parent stack is replaced by an implicit call stack of the recursive function, but otherwise - it's the same algorithm [3].
Finally, some credits are due. Thanks to my wife for helping me come up with the iterative formulation of the algorithm. Thanks to Tim Visée for the inspiration for this post.
[1] | Note that this is not a binary search tree; the order of values in the leaves is entirely arbitrary. |
[2] | One way the algorithm avoids ambiguity is by requiring that nodes in the tree have either no children or two children. Nodes with one child would confuse the algorithm; can you see why? |
[3] | Here is an exercise: the code of the iterative algorithm is currently structured to ease understanding, but what happens if we merge the conditions of t.Left == nil, checking it in just one place and then either inserting (if the depth matches) or keep looking; and the same for t.Right. If you make these changes the algorithm will still work (feel free to use the tests in the accompanying code), and it starts resembling the recursive version even more. |