Finding Goroutine Bugs with TLA+

My job these days is teaching TLA+ and formal methods: specifying designs to find bugs in them. But just knowing the syntax isn’t enough to write specs, and it helps to have examples to draw from. I recently read Chris Siebenmann’s Even in Go, concurrency is still not easy and thought it would make a good case study for writing a spec. In it, he gives an example of Go code which deadlocks:

/*1 */ func FindAll() []P { //P = process data
/*2 */    pss, err := ps.Processes()
/*3 */    [...]
/*4 */    found := make(chan P)
/*5 */    limitCh := make(chan struct{}, concurrencyProcesses)
/*6 */ 
/*7 */    for _, pr := range pss {
/*8 */       limitCh <- struct{}{}
/*9 */       pr := pr
/*10*/       go func() {
/*11*/          defer func() { <-limitCh }()
/*12*/          [... get a P with some error checking ...]
/*13*/          found <- P
/*14*/       }()
/*15*/    }
/*16*/    [...]
/*17*/ 
/*18*/    var results []P
/*19*/    for p := range found {
/*20*/       results = append(results, p)
/*21*/    }
/*22*/    return results
/*23*/ }

In his words:

The bug is that the goroutines only receive from limitCh to release their token after sending their result to the unbuffered found channel, while the main code only starts receiving from found after running through the entire loop, and the main code takes the token in the loop and blocks if no tokens are available.

Let’s model this in TLA+!

Note: this example assumes some basic knowledge of TLA+ and PlusCal. If you know what a process label is, then you’re fine. We’re not looking at race conditions so you shouldn’t need to know the finer points of labels and actions, but it might make things more intuitive. I included syntax explanations where appropriate. It also assumes some basic knowledge of Go, which I am in no way qualified to explain.

Planning Ahead

It’s good to start with some upfront thinking of how we are going to approach this problem. There’s two parts to formally specifying something: describing the system and describing the properties of the system. In this case we can ignore the second part, since we’re only looking for deadlocks. It would be good modeling practice to add sanity check properties, like type invariants, but they are not strictly necessary.

We have a choice of doing everything in TLA+ or using a mix of TLA+ and PlusCal, a sequential algorithm DSL that compiles to TLA+. Since the Go code is highly sequential and because PlusCal is easier for outsiders to understand, I will use PlusCal. This will cause a bit of impedance mismatch down the road for unbuffered channels but overall it’s a net benefit. The core structure of a PlusCal spec is a set of defined processes with both local and global state. At the very least we will have one process for main and one process for each of the goroutines.

There are three “complex bits” to Chris’s code sample: go, defer, and the nature of Go channels. defer runs cleanup code when a goroutine is finished running. For now we’ll represent this by moving the deferred code to its own label, but we could also use a PlusCal procedure to be more accurate. go spawns a new goroutine. Since PlusCal requires us to define all of our processes in advance, we can’t “spawn” a new goroutine. What we can do instead is define each process but prevent them from running. Then we add a flag that says whether or not the goroutine was initialized in the behavior yet. It would look something like this:

variables
  initialized = [w \in Routines |-> FALSE];

\* ...

process goroutine \in Routines
begin
  Work:
    await initialized[self];
\* ...

syntax explanation

Routines is a set of identifiers I’ll define in the spec itself. Let’s assume it’s {1, 2, 3} for now.
initialized is a TLA+ function, or mapping. It’s a mathematical function, not a programmatic one. initialized[1] = FALSE, initialized[2] = FALSE, etc.
By writing goroutine \in Routines, I’m declaring one independent concurrent process for element of Routines. In this case, that’d be 3 processes.
self is the local identifier assigned to a process.

So each goroutine awaits being initialized by the main process before it does anything. This is how we can emulate spawning new processes.

That leaves the channels, which are the most complicated part to specify. There are two kinds of Go channels: buffered and unbuffered. Sends to a buffered channel are blocked if the channel is full. Receives from a buffered channel are blocked if the channel is empty. Both of these are representable with PlusCal macros:

macro send_buffered(chan) begin
  await channels[chan] < buffered[chan];
  channels[chan] := channels[chan] + 1;
end macro;

macro receive_buffered(chan) begin
  await channels[chan] > 0;
  channels[chan] := channels[chan] - 1;
end macro;

For the purposes of pedagogy I’m not modeling what we actually read or write. This is good practice when writing real-world specs too: write the simplest specification that usefully captures behavior and iteratively add detail to that.

That covers buffered channels. Unbuffered channels, by contrast, always block unless there is both a sender and receiver. In pure TLA+ this wouldn’t be too tricky to specify, but PlusCal assumes each step of the behavior is one process doing one thing. Unbuffered channels can’t be represented natively without adding some annoying bookkeeping, as we need to have one process block “first”. We’ll address that when we get to the spec.

So now that we know a rough approach and what the pain points are likely to be, let’s write the spec.

The spec

---- MODULE channels ----
EXTENDS Integers, TLC, Sequences

CONSTANTS NumRoutines, NumTokens
Routines == 1..NumRoutines

show syntax

EXTENDS are imports.
CONSTANTS are values configurable per run, so I can run one model check with 2 goroutines and 1 token and a different check with 6 goroutines and 25 tokens. Both constants are assumed to be positive integers, which I am not enforcing out of pure laziness.
== defines a new operator, or airquotes “programming function”. We’re saying that Routines is the set {1, 2, ... , NumRoutines}.

(* --algorithm channels

variables
  channels = [tokens |-> 0, found |-> {}];
  buffered = [tokens |-> NumTokens];
  initialized = [w \in Routines |-> FALSE];

channels is the current contents of each channel. For buffered channels, we treat their contents as a single number and store the maximum capacity in a separate buffered variable. For unbuffered channels, we instead store the set of senders waiting for a receiver. initialized is for emulating goroutines.

macro go(routine) begin
  initialized[routine] := TRUE;
end macro

An extra macro I added to more closely match the Go syntax.

macro write_buffered(chan) begin
  await channels[chan] < buffered[chan];
  channels[chan] := channels[chan] + 1;
end macro;

macro receive_channel(chan) begin
  if chan \in DOMAIN buffered then
    await channels[chan] > 0;
    channels[chan] := channels[chan] - 1;
  else
    await channels[chan] /= {};
    with w \in channels[chan] do
      channels[chan] := channels[chan] \ {w}
    end with;
  end if;
end macro;

This is a change from our old read_buffered because it handles both buffered and unbuffered channels. Buffered channels work as expected. For unbuffered channels, we wait for the set of blocked writers to be nonempty and nondeterministically declare that we read from one of them.¹

procedure write_unbuffered(chan) begin
  DeclareSend:
    channels[chan] := channels[chan] \union {self};
  Send:
    await self \notin channels[chan];
    return;
end procedure

show syntax

procedure declares a multistep programmatic function. Other processes are allowed to interrupt between running DeclareSend and Send.
return terminates the procedure. Finishing a procedure without a return is considered an error.

To model unbuffered channels we can either put state on senders or put state on receivers. I opted to place it on the sender because Go permits reading from multiple unbuffered channels at once.² In two separate temporal steps we 1) add the process to the set of channel senders and 2) wait to be removed from that set by a receiver.

process goroutine \in Routines
begin
  A:
    await initialized[self];
    call write_unbuffered("found");
  B:
    receive_channel("tokens");
end process;

Our goroutine process is a straightforward translation of the Go code. First we wait for the goroutine to be initialized, corresponding to line 10. Then we write to the found channel (line 13). If I was trying to be more faithful I would write special defer semantics, but for this I’m happy to just stick it on a label at the end of the process.

process main = 0
variables i = 1;
begin
  Main:
    while i <= NumRoutines do
      write_buffered("tokens");
      go(i);
      i := i + 1;
    end while;
  Get:
    while i > 1 do
      i := i - 1;
      receive_channel("found");
    end while;
end process;

end algorithm; *)

TLA+ doesn’t have a native for loop, so we have to emulate it with while. Unlike programming languages, we count 1..N, not 0..(N-1). Our emulation uses one token to initialize each goroutine. Since write_channel has an await in it, it will block if there are more goroutines than tokens. It will then stay blocked until a goroutine releases a token.³ Final spec:

show spec

---- MODULE channels ----
EXTENDS Integers, TLC, Sequences

CONSTANTS NumRoutines, NumTokens

Routines == 1..NumRoutines

(* --algorithm channels

variables
  channels = [limitCh |-> 0, found |-> {}];
  buffered = [limitCh |-> NumTokens];
  initialized = [w \in Routines |-> FALSE];


macro send_buffered(chan) begin
  await channels[chan] < buffered[chan];
  channels[chan] := channels[chan] + 1;
end macro;

macro receive_channel(chan) begin
  if chan \in DOMAIN buffered then
    await channels[chan] > 0;
    channels[chan] := channels[chan] - 1;
  else
    await channels[chan] /= {};
    with w \in channels[chan] do
      channels[chan] := channels[chan] \ {w}
    end with;
  end if;
end macro;

macro go(routine) begin
  initialized[routine] := TRUE;
end macro

procedure send_unbuffered(chan) begin
  DeclareSend:
    channels[chan] := channels[chan] \union {self};
  Send:
    await self \notin channels[chan];
    return;
end procedure

process goroutine \in Routines
begin
  A:
    await initialized[self];
    call send_unbuffered("found");
  B:
    receive_channel("limitCh");
end process;

process main = 0
variables i = 1;
begin
  Main:
    while i <= NumRoutines do
      send_buffered("limitCh");
      go(i);
      i := i + 1;
    end while;
  Get:
    while i > 1 do
      i := i - 1;
      receive_channel("found");
    end while;
end process;

end algorithm; *)
====

Now that we have a full spec, we can use the model checker, TLC, to see if it satisfies any properties. We didn’t specify any, but TLC will check for deadlocks by default. I’m going to model check it with 3 goroutines and 2 tokens. ⁴

Finding Deadlocks

Unsurprisingly, this deadlocks:⁵

show error

State 1: <Initial predicate>
/\ buffered = [limitCh |-> 2]
/\ channels = [limitCh |-> 0, found |-> {}]
/\ i = 1
/\ pc = (0 :> "Main" @@ 1 :> "A" @@ 2 :> "A" @@ 3 :> "A")
/\ initialized = <<FALSE, FALSE, FALSE>>

State 2: <Main line 128, col 9 to line 137, col 48 of module base>
/\ buffered = [limitCh |-> 2]
/\ channels = [limitCh |-> 1, found |-> {}]
/\ i = 2
/\ pc = (0 :> "Main" @@ 1 :> "A" @@ 2 :> "A" @@ 3 :> "A")
/\ initialized = <<TRUE, FALSE, FALSE>>

State 3: <Main line 128, col 9 to line 137, col 48 of module base>
/\ buffered = [limitCh |-> 2]
/\ channels = [limitCh |-> 2, found |-> {}]
/\ i = 3
/\ pc = (0 :> "Main" @@ 1 :> "A" @@ 2 :> "A" @@ 3 :> "A")
/\ initialized = <<TRUE, TRUE, FALSE>>

State 4: <A line 106, col 12 to line 114, col 64 of module base>
/\ buffered = [limitCh |-> 2]
/\ channels = [limitCh |-> 2, found |-> {}]
/\ i = 3
/\ pc = (0 :> "Main" @@ 1 :> "A" @@ 2 :> "DeclareSend" @@ 3 :> "A")
/\ initialized = <<TRUE, TRUE, FALSE>>

State 5: <A line 106, col 12 to line 114, col 64 of module base>
/\ buffered = [limitCh |-> 2]
/\ channels = [limitCh |-> 2, found |-> {}]
/\ i = 3
/\ pc = (0 :> "Main" @@ 1 :> "DeclareSend" @@ 2 :> "DeclareSend" @@ 3 :> "A")
/\ initialized = <<TRUE, TRUE, FALSE>>

State 6: <DeclareSend line 92, col 22 to line 95, col 77 of module base>
/\ buffered = [limitCh |-> 2]
/\ channels = [limitCh |-> 2, found |-> {1}]
/\ i = 3
/\ pc = (0 :> "Main" @@ 1 :> "Send" @@ 2 :> "DeclareSend" @@ 3 :> "A")
/\ initialized = <<TRUE, TRUE, FALSE>>

State 7: <DeclareSend line 92, col 22 to line 95, col 77 of module base>
/\ buffered = [limitCh |-> 2]
/\ channels = [limitCh |-> 2, found |-> {1, 2}]
/\ i = 3
/\ pc = (0 :> "Main" @@ 1 :> "Send" @@ 2 :> "Send" @@ 3 :> "A")
/\ initialized = <<TRUE, TRUE, FALSE>>

It’s the same issue that Chris had. The goroutines can only return their tokens if there is a receiver on the found channel, the only receiver of that channel is main, main only reads after it initializes all the goroutines, and main will block if there are more goroutines than tokens. The goroutines can’t return tokens until all goroutines are initialized, and main can’t initialize all goroutines until some goroutines have returned their tokens.

Fixing This

Chris suggests three possible ways of fixing this. We can test each of the three by modifying our spec:

If the goroutines took the token by sending to limitCh instead of the main for loop doing it, the bug would not exist;

process goroutine \in Routines
begin
  A:
    await initialized[self];
+   write_buffered("limitCh");

\* ...

    while i <= NumRoutines do
-     write_buffered("limitCh");
      initialized[i] := TRUE;
      i := i + 1;
    end while;

This passes model checking.

If the goroutines received from limitCh to release their token before sending to found, it wouldn’t exist (but because of error handling, it’s simpler and more reliable to do the receive in a defer).

process goroutine \in Routines
begin
  A:
    await initialized[self];
+   receive_channel("limitCh");
-   call write_unbuffered("found");
  B:
-   receive_channel("limitCh");
+   call write_unbuffered("found");
end process;

This passes model checking.

And if the entire for loop was in an additional goroutine…

This one’s a little more complicated. We create a new process for the loop and add its identifier to initialized. I’ll use -1 to represent the for loop.

initialized = [w \in Routines \union {-1} |-> FALSE];

\* After goroutines

process for_loop = -1
variables i = 1;
begin
  Loop:
    while i <= NumRoutines do
      write_buffered("limitCh");
      go(i);
      i := i + 1;
    end while;
end process;

Then we modify main to initialize this instead of doing the loop itself:

process main = 0
variables i = NumRoutines;
begin
  Main:
    go(-1);
  Get:
    while i > 0 do
      i := i - 1;
      receive_channel("found");
    end while;
end process;

This passes model checking.

Discussion

Ultimately we wrote about 75 lines of specification to test 20 lines of Go code. Over half the spec is channel logic which we can now reuse in other specs. Discounting those puts us a little closer, though I’ll admit that a real TLA+ spec would be a lot longer because you’d be writing a lot more sanity checking properties. Noneless, writing the TLA+ version wouldn’t be significantly more effort than writing the original version and could save you net time if it caught the deadlock before production.

Obviously I’m inclined towards using TLA+, given it’s my professional language of choice. However, I suspect that TLA+ isn’t exactly the right tool for modeling Go channels. That’s because there’s another formal method, called Spin, which is much closer to Go’s native semantics. It even has channels as a native datatype. I’ve not used Spin so cannot comment on how effective it truly is but I imagine it would work very well in this domain. It’s also been used before to model the runtime scheduler, though that spec was removed when it eventually fell out of sync. You can see a Spin specification of the same problem here.

If you’re interested in learning more about formal methods, I wrote a book and am running a 3-day TLA+ workshop in October. You can also follow my newsletter where I discuss formal methods techniques and patterns.

Thanks to Richard Whaling and Damian Gryski for feedback.

I shared the first draft of this essay on my newsletter. If you like my writing, why not subscribe?

with blocks if the set is empty, making the await statement above it redundant. I added it purely for clarity. ^[return]
You can also use a select to send to multiple channels, but I think that’s less common? This is where being able to write a raw TLA+ action could be really helpful. ^[return]
Get is an inaccurate representation of how channel receives in a range work: it should loop until the channel is closed. I left that out here because the rest of the spec doesn’t depend on closing channels and I didn’t want to add extra complexity to this example. ^[return]
Note for people following along: while I strongly recommend beginners use the official IDE, I personally prefer to run things from the command line while writing essays. I wrote a simple cli wrapper, which isn’t nearly as full-featured as the toolbox but gets what I need done. With the wrapper, you can reproduce the model run with tlacli check channels.tla --constant NumRoutines 3 --constant NumTokens 2 --model-value defaultInitValue. ^[return]
I removed the stack and chan variables from the trace to make it a little clearer. PlusCal uses stack to track bookkeeping information in procedures, while chan tracks what inputs were passed in as arguments. ^[return]