The World and the Machine

This is just a way of thinking about formal specification that I find really useful. The terms originally come from Michael Jackson’s Software Requirements and Specifications.

In specification, the machine is the part of the system you have direct control over and the world is all the parts that you don’t. Take a simple transfer spec:

---- MODULE transfer ----
EXTENDS TLC, Integers
CONSTANTS People, Money, NumTransfers

(* --algorithm transfer
variables
  acct \in [People -> Money];

define
  \* Invariant
  NoOverdrafts ==
    \A p \in People:
      acct[p] >= 0
end define;

process cheque \in 1..NumTransfers
variable
  amnt \in 1..5;
  from \in People;
  to \in People
begin
  Check:
    if acct[from] >= amnt then
      Withdraw:
        acct[from] := acct[from] - amnt;
      Deposit:
        acct[to] := acct[to] + amnt;
    end if;
end process;
end algorithm; *)
====

The code that handles the transfer (represented by the cheque process) is the machine. It currently has a race condition that can break the invariant NoOverdrafts, where someone with $5 can submit two checks for $4 each and get them both past the guard clause.

One way you could solve this is by adding a lock so that the banking service only resolves one cheque at a time. One way you can’t solve it is by forcing people to deposit one cheque at a time. You don’t have any control over what people do with their chequebook! The people and their behavior is part of the world.

Whether something belongs to the world or the machine depends on your scope in the system. If you maintain one service and the other teams aren’t returning your calls, then their components are part of the world. If they’re sending you bad data, you need to faithfully model receiving that bad data in your spec as part of designing your machine.

Some notes on this model

While you need to model the whole system, you’re only designing the machine. So the implementation details of the machine matter, but you don’t need to implement the world. It can be abstracted away, except for how it affects the machine. In the above example, we don’t need to model a person deciding to write a cheque or the person depositing it, just the transfer entering the system.

Observability and observable properties

Like in OOP, some system state is restricted to the world or machine.

The world can both read and write the from and to variables, but the machine can only read them. In most specifications these restrictions are implicit; you could write the spec so that the machine changes from, but your boss wouldn’t let you build it.
The “program counter”, or line each process is currently executing, isn’t readable or writable by the world. It’s an implementation detail of the machine.
acct can be written by the machine and read by the world. I call these observable.

We can divide properties into internal properties that concern just the machine and external (observable) properties that can be seen by the outside world. NoOverdrafts is observable: if it’s violated, someone will be able to see that they have a negative bank balance. By contrast, “at most one process can be in the withdraw step” (OnlyOneWithdraw) is internal. The world doesn’t have access to your transaction logs, nobody can tell whether OnlyOneWithdraw is satisfied or not.

Internal properties are useful but they’re less important than observable properties. An OnlyOneWithdraw violation might be the root cause of a NoOverdrafts violation, but NoOverdrafts is what actually matters to people. If a property isn’t observable, it doesn’t have any connection to the broader world, so nobody is actually affected by it breaking.

Misc

If both the world and machine can write to a variable, generally the world should be able to do more with the variable than the machine can. IE the machine can only modify acct by processing transfers, while the world can also do deposits and withdrawals. It’s exceedingly hard to enforce MISU on state the world can modify.
If the world can break an invariant, it’s not an invariant. Instead you want “resilience”, the ability to restore some system property after it’s been broken. See here for more on modeling resilience.
It’s not uncommon for a spec to break not because the machine has a bug, but because the surrounding world has changed.

Thanks to Andrew Helwer and Lars Hupel for feedback. If you liked this, come join my newsletter! I write new essays there every week.