Go internals: capturing loop variables in closures



The Go wiki has a page titled CommonMistakes. Amusingly, it only lists a single entry at this time - using goroutines on loop iterator variables, providing this example:

for _, val := range values {
  go func() {
    fmt.Println(val)
  }()
}

This will print the last value in values, len(values) times. The fix is very simple:

// assume the type of each value is string
for _, val := range values {
  go func(val string) {
    fmt.Println(val)
  }(val)
}

Being aware of the fix is sufficient to be able to write correct Go programs. However, if you find the details of Go's implementation fascinating, this post should provide a deeper understanding of the problem and its solution.

Basics - passing by value and by reference

Go makes the distinction between passing objects by value and by reference [1] explicit. Let's start with example 1 [2]:

func foobyval(n int) {
  fmt.Println(n)
}

func main() {
  for i := 0; i < 5; i++ {
    go foobyval(i)
  }

  time.Sleep(100 * time.Millisecond)
}

It should come as no surprise that this will print out the values 0 to 4, likely in some scrambled order. So we'll move on to example 2:

func foobyref(n *int) {
  fmt.Println(*n)
}

func main() {
  for i := 0; i < 5; i++ {
    go foobyref(&i)
  }

  time.Sleep(100 * time.Millisecond)
}

This code has a data race and will fail the Go race detector (when executed with -race). On my machine it prints:

5
5
5
5
5

But it could also print other sequences. Understanding why will take us 80% of the way to grokking the main topic of this post, so let's spend some time exploring this.

It turns out that the answer is right there in the Go spec, which states:

Variables declared by the init statement are re-used in each iteration.

This means that when the program is running, there's just a single object representing i, not a new one for each iteration. This object gets assigned a new value on each iteration.

Let's study the difference in the generated machine code [3] for the for loop between examples 1 and 2. Starting with example 1:

0x0021 00033 (go-func-byval.go:13)      MOVQ    AX, "".i+24(SP)
0x0026 00038 (go-func-byval.go:14)      MOVL    $8, (SP)
0x002d 00045 (go-func-byval.go:14)      LEAQ    "".foobyval·f(SB), CX
0x0034 00052 (go-func-byval.go:14)      MOVQ    CX, 8(SP)
0x0039 00057 (go-func-byval.go:14)      MOVQ    AX, 16(SP)
0x003e 00062 (go-func-byval.go:14)      CALL    runtime.newproc(SB)
0x0043 00067 (go-func-byval.go:13)      MOVQ    "".i+24(SP), AX
0x0048 00072 (go-func-byval.go:13)      INCQ    AX
0x004b 00075 (go-func-byval.go:13)      CMPQ    AX, $5
0x004f 00079 (go-func-byval.go:13)      JLT     33

The go statement gets translated to a call to runtime.newproc. While this machinery is interesting, we'll leave it for a separate article. What's interesting for our needs here is how the loop variable i is lowered. It's stored in the AX register, which is then passed by value onto the stack as an argument for the call too foobyval [4]. The by value here manifests by simply copying the value of AX onto the stack. Changing AX will henceforth not affect the value passed into foobyval.

On the other hand, the same loop in example 2 looks as follows:

0x0039 00057 (go-func-byref.go:14)      MOVL    $8, (SP)
0x0040 00064 (go-func-byref.go:14)      LEAQ    "".foobyref·f(SB), CX
0x0047 00071 (go-func-byref.go:14)      MOVQ    CX, 8(SP)
0x004c 00076 (go-func-byref.go:14)      MOVQ    AX, 16(SP)
0x0051 00081 (go-func-byref.go:14)      CALL    runtime.newproc(SB)
0x0056 00086 (go-func-byref.go:13)      MOVQ    "".&i+24(SP), AX
0x005b 00091 (go-func-byref.go:13)      INCQ    (AX)
0x005e 00094 (go-func-byref.go:13)      CMPQ    (AX), $5
0x0062 00098 (go-func-byref.go:13)      JLT     57

The code is very similar, with just one critical difference. Now AX holds the address of i rather than its value. Note how the increment and comparison for the loop condition are made on (AX) instead of AX. When we pass AX onto the stack, we're passing the address of i. Modifying (AX) will actually affect the value of i observed inside the goroutine.

None of this should be surprising - after all, we're passing a pointer to an integer into foobyref.

What happens at run-time is that, on my machine, the loop finishes running before any of the goroutines it launches actually start. Once they do, they still hold a reference to the same i - not a copy of it. What's the value of i at this point? It's 5, exactly where the loop stops iterating. This is why all the goroutines print out 5. But the goroutines could also launch in the middle of the loop and a difference sequence would occur - a classical data race.

Methods with value vs. pointer receivers

A similar artifact can be observed when creating goroutines that invoke methods. This is even pointed out explicitly on the CommonMistakes page. Consider example 3:

type MyInt int

func (mi MyInt) Show() {
  fmt.Println(mi)
}

func main() {
  ms := []MyInt{50, 60, 70, 80, 90}
  for _, m := range ms {
    go m.Show()
  }

  time.Sleep(100 * time.Millisecond)
}

This prints the elements of ms, possibly in a scrambled order, as you'd expect. A very similar example 4 uses a pointer receiver for the Show method:

type MyInt int

func (mi *MyInt) Show() {
  fmt.Println(*mi)
}

func main() {
  ms := []MyInt{50, 60, 70, 80, 90}
  for _, m := range ms {
    go m.Show()
  }

  time.Sleep(100 * time.Millisecond)
}

Can you guess what the output is going to be? It's 90 repeated five times. The reason is exactly the same as in the simpler example 2. Here it's a bit more insidious because of the syntactic sugar Go applies to calling methods with pointer receivers. Whereas between examples 1 and 2 we changed the argument passed into the function from i to &i, here the method invocation is exactly the same! It's go m.Show() in both cases, yet the behavior is very different.

This is a rather unfortunate confluence of Go features, IMHO. Nothing in the call site hints that m is passed by reference, and one has to go look at the documentation of the Show method to see that (the method can be in a completely different package from the calling code, of course). In most cases this feature is useful, enabling us to write cleaner code. But here the passing by reference has unexpected effects.

Closures

Finally we come back to closures, with example 5:

func foobyval(n int) {
  fmt.Println(n)
}

func main() {
  for i := 0; i < 5; i++ {
    go func() {
      foobyval(i)
    }()
  }

  time.Sleep(100 * time.Millisecond)
}

This is likely to print not what you'd expect. On my machine it prints:

5
5
5
5
5

Even though i is passed by value to foobyval in the the closure, which seems like it should be fine based on example 1. Let's find out why. We'll start with the disassembly of the for loop:

0x0039 00057 (go-closure.go:14) MOVL    $8, (SP)
0x0040 00064 (go-closure.go:14) LEAQ    "".main.func1·f(SB), CX
0x0047 00071 (go-closure.go:14) MOVQ    CX, 8(SP)
0x004c 00076 (go-closure.go:14) MOVQ    AX, 16(SP)
0x0051 00081 (go-closure.go:14) CALL    runtime.newproc(SB)
0x0056 00086 (go-closure.go:13) MOVQ    "".&i+24(SP), AX
0x005b 00091 (go-closure.go:13) INCQ    (AX)
0x005e 00094 (go-closure.go:13) CMPQ    (AX), $5
0x0062 00098 (go-closure.go:13) JLT     57

It's fairly similar to example 2: note that i is represented as an address in the AX register, meaning that we pass it around by reference, even though the closure calls foobyval. This loop body invokes a function using runtime.newproc, but where does this function come from?

func1 is created by the compiler to represent the closure. The compiler outlines the closure's code into a standalone function and inserts a call to it in main. The main challenge in outlining closures like this is how to treat the variables the closure implicitly uses but weren't declared in its argument list.

Here's the body of func1:

0x0000 00000 (go-closure.go:14) TEXT    "".main.func1(SB), ABIInternal, $16-8
0x0000 00000 (go-closure.go:14) MOVQ    (TLS), CX
0x0009 00009 (go-closure.go:14) CMPQ    SP, 16(CX)
0x000d 00013 (go-closure.go:14) JLS     56
0x000f 00015 (go-closure.go:14) SUBQ    $16, SP
0x0013 00019 (go-closure.go:14) MOVQ    BP, 8(SP)
0x0018 00024 (go-closure.go:14) LEAQ    8(SP), BP
0x001d 00029 (go-closure.go:15) MOVQ    "".&i+24(SP), AX
0x0022 00034 (go-closure.go:15) MOVQ    (AX), AX
0x0025 00037 (go-closure.go:15) MOVQ    AX, (SP)
0x0029 00041 (go-closure.go:15) CALL    "".foobyval(SB)
0x002e 00046 (go-closure.go:16) MOVQ    8(SP), BP
0x0033 00051 (go-closure.go:16) ADDQ    $16, SP
0x0037 00055 (go-closure.go:16) RET

The interesting thing to note is that the function has an argument in 24(SP), which is an int pointer - note the MOVQ (AX), AX which extracts its value before passing it to foobyval. In essence, func1 looks something like this:

func func1(i *int) {
  foobyval(*i)
}

And the loop in main is transformed into:

for i := 0; i < 5; i++ {
  go func1(&i)
}

This is equivalent to example 2, which explains the output of the program. The technical jargon for what's happening here is that i is a free variable inside the closure, and such variables are captured by reference in Go.

Are they always, though? Surprisingly, the answer is no. In some cases, free variables are captured by value. Here's a variation of our example:

for i := 0; i < 5; i++ {
        ii := i
        go func() {
                foobyval(ii)
        }()
}

This will print 0, 1, 2, 3, 4 in a scrambled order. Why is the behavior here different from example 5? After all, ii is a free variable in the closure, same as i in example 5.

It turns out this behavior is an artifact of the heuristics used by the Go compiler to handle closures.

Peeking under the hood

If you're not familiar with the architecture of the Go compiler, I suggest checking out my earlier article on Go compiler internals - part 1, part 2.

The concrete syntax tree created by the parser creates the following note for the go statement calling the closure:

0: *syntax.CallStmt {
.  Tok: go
.  Call: *syntax.CallExpr {
.  .  Fun: *syntax.FuncLit {
.  .  .  Type: *syntax.FuncType {
.  .  .  .  ParamList: nil
.  .  .  .  ResultList: nil
.  .  .  }
.  .  .  Body: *syntax.BlockStmt {
.  .  .  .  List: []syntax.Stmt (1 entries) {
.  .  .  .  .  0: *syntax.ExprStmt {
.  .  .  .  .  .  X: *syntax.CallExpr {
.  .  .  .  .  .  .  Fun: foobyval @ go-closure.go:15:4
.  .  .  .  .  .  .  ArgList: []syntax.Expr (1 entries) {
.  .  .  .  .  .  .  .  0: i @ go-closure.go:15:13
.  .  .  .  .  .  .  }
.  .  .  .  .  .  .  HasDots: false
.  .  .  .  .  .  }
.  .  .  .  .  }
.  .  .  .  }
.  .  .  .  Rbrace: syntax.Pos {}
.  .  .  }
.  .  }
.  .  ArgList: nil
.  .  HasDots: false
.  }
}

The function called is represented by a FuncLit node - a function literal. When this tree is converted into the AST, outlining the function literal into a standalone function is one of the outcomes. This is done in the noder.funcLit method that lives in gc/closure.go.

The type checker completes the transformation and we get this AST node for the outlined function representing the closure:

main.func1:
.   DCLFUNC l(14) tc(1) FUNC-func()
.   DCLFUNC-body
.   .   CALLFUNC l(15) tc(1)
.   .   .   NAME-main.foobyval a(true) l(8) x(0) class(PFUNC) tc(1) used FUNC-func(int)
.   .   CALLFUNC-list
.   .   .   NAME-main.i l(15) x(0) class(PAUTOHEAP) tc(1) used int

Note that the value passed into foobyval is NAME-main.i, directly referencing the variable from the function enclosing the closure.

At this point comes the stage of the compiler most relevant to our exploration - capturevars. Its goal is to decide how to capture closed variables (i.e. free variables used in closures). Here's a comment for the relevant function in the compiler, which also describes the heuristic:

// capturevars is called in a separate phase after all typechecking is done.
// It decides whether each variable captured by a closure should be captured
// by value or by reference.
// We use value capturing for values <= 128 bytes that are never reassigned
// after capturing (effectively constant).

When it runs on example 5, capturevars marks the loop variable i as captured by reference, and adds a addrtaken flag to it. This is visible in the AST:

FOR l(13) tc(1)
.   LT l(13) tc(1) bool
.   .   NAME-main.i a(true) g(1) l(13) x(0) class(PAUTOHEAP) esc(h) tc(1) addrtaken assigned used int

For the loop variable, the heuristic for value capturing doesn't apply because the value is reassigned after the call (recall the quote from the spec which says that loop vars are re-used between iterations). Therefore, i is captured by reference.

In the variation where we have ii := i before the go statement, ii is not reassigned after the goroutine is launched, so it's captured by value [5].

Therefore, we see a fascinating example of two language features interacting in unexpected ways. Instead of defining a new variable for each iteration, Go reuses the same one. This, in turn, leads to the capturevars heuristic to tag it for reference capturing, which leads to unexpected output. The Go FAQ admits this behavior may have been a design mistake:

This behavior of the language, not defining a new variable for each iteration, may have been a mistake in retrospect. It may be addressed in a later version but, for compatibility, cannot change in Go version 1.

Once you're aware of the problem, it shouldn't cause you much problems in realistic scenarios - just be wary of free variables captured by closures. Always assume they can be captured by reference, unless you explicitly checked that the capture is by value. To avoid mistakes, only read-only values should be left to be implicitly captured by closures invoked in goroutines; this makes sense from a concurrency standpoint as well.


[1]Some readers noted that - strictly speaking - there is no concept of pass by reference in Go, as everything is passed by value, including possibly pointers. In this article when I say "pass by reference", I mean "pass by address", which in some cases is explicit (as in passing &n into a function expecting a *int), and in other cases implicit - as the latter parts of the article demonstrate.
[2]Here and elsewhere in this post, I'm using time.Sleep as a quick and hacky way to wait for all the spawned goroutines to finish. Without this, main will return before the other goroutine start running. The correct way to wait for goroutines in real code would be something like a WaitGroup or a done channel.
[3]The disassembly for all the samples in this post is obtained by calling the Go compiler directly with go tool compile -l -S. The -l flag disables function inlining, which makes the resulting assembly more readable.
[4]foobyval is not called directly because it's invoked in a go statement. Rather, we pass its address as the second argument (at 16(SP)) to runtime.newproc, and the arguments to foobyval (i in this case) follow higher on the stack.
[5]As an exercise, add a dummy ii = 10 assignment as the very last statement in the for body (after the go closure call). What is the output? Why?

Recent posts

03.09.2019: RSA - theory and implementation
28.08.2019: The Chinese Remainder Theorem
03.08.2019: AES encryption of files in Go
22.07.2019: Faster XML stream processing in Go
15.07.2019: Passing callbacks and pointers to Cgo
04.07.2019: Go compiler internals: adding a new statement to Go - Part 2
03.07.2019: Go compiler internals: adding a new statement to Go - Part 1
29.06.2019: Summary of reading: April - June 2019
31.05.2019: GeoIP service as a cloud function
07.05.2019: To ORM or not to ORM

See Archives for a full list.