How to find bugs in systems that don't exist

This version of the talk was given at QCon London, 2026.

Building resilient systems takes thinking outside the box, and the fastest way to do that is to think inside a different box. One different box is “formal methods”, the discipline of mathematically verifying software and systems. Formal methods teaches us to see a system through three different perspectives: the abstract specification behind the system, the environment it assumes, and the properties it should and shouldn’t have. Rather than gradually learn these perspectives from months of using formal methods, we will instead learn them through a forty-ish minute conference talk.

Slides are here.

Questions

These are questions I received during and immediately after the talk. Some of them won’t make any sense until the video itself is up.

Is this process time-consuming?

Actually doing formal methods can be pretty time consuming, but informally applying the heuristics can be pretty fast.

Does formal methods work for legacy development or is it only for green field projects?

See using FM at work. The gist is that while it’s easier to use FM on a new project, on existing projects you can calibrate the model by showing it reproduces previous design bugs. You also have a much better sense of whether a model checking failure is a true bug or a mistake you made in modeling. These are huge benefits!

Do I have to code in a specific language to benefit from these approaches?

Nope!

Say I have system S, assumption E, and property P, and know that S guarantees P under assumption E. Now I weaken my assumptions, say to E’, and S does not guarantee P under E’. Can we tell in advance whether it’s even possible to guarantee P with a more sophisticated system S’, or whether we can only guarantee a weaker property no matter what else we build?

No, we can’t prove in advance whether or not P is still satisfiable by some system under E’. If we can find a system S’ that satisfies P under E’, we can then apply refinement to show S’ is a more detailed version of S, but that isn’t quite the same thing.

(Honestly I think trying the capture the question in a clear way is more insightful than the answer!)

Sometimes we have multiple systems that each satisfy their local properties, but interact in a way that breaks a global property. How do we handle this with specification?

This is known as feature interaction and is also one of the main use cases of refinement. We first model out the global system, keeping the local components highly abstract, and then refine each of them to explore the local properties in more detail. It isn’t perfect, but feature interaction is generally an Essentially Hard problem and every little bit helps.

This is good for nondeterministic systems. What about deterministic code?

You can use similar approaches for modeling abstractions of deterministic code, though in that context you can use a more specialized tool.

Can we use formal methods to reduce the error boundary of AI agents?

I have no idea.

How does formal methods compare to property-based testing?

I’m still trying to figure out this one, but right now I think it boils down to “breadth” vs “fidelity”. Specifications verify a property holds for all traces in an abstraction of the system. Property testing verify a property holds for a sample of traces in the real system. Formal code verification can do both but it takes a lot more work than the other two.

Why did you say there’s a relation between system “sophistication” and property strength, and not system “complexity” and property strength?

Another one I’m trying to figure out. It doesn’t seem right to say that, to get a stronger property, you need a more complex system. Sometimes a different but equally simple system would work. But it might take a lot of effort to find that new simple system, so the new system would be more “sophisticated” than the old one, even if it’s no more complex.

Does specifying systems and finding bugs give you an intuition for building distributed systems?

I’d say so, but I’m biased. Formal specs compress the feedback loop to finding bugs, meaning you get more practice with identifying and fixing issues.

How would you mathematically represent the property “this token can only be used once”?

It depends on the formalism, the system, and what you mean by “use a token”. Generally the easiest way would be to add a token_used variable and check assert the property always (token_used => always !use(token)). You’d have to be sure to set token_used in all the places you could use the token, though. In linear temporal logic, you can write always (use(token) => next (always !use(token))).