A better explanation of the Liskov Substitution Principle
Short version: If X inherits from Y, then X should pass all of Y’s black box tests.
I first encountered this idea at SPLASH 2021.
The longer explanation
A bit of background
In A Behavioral Notion of Subtyping Liskov originally defined subtyping in inherited objects as follows:
Subtype Requirement: Let P(x) be a property provable about objects x of type T. Then P(y) should be true for objects y of type S where S is a subtype of T.
Later, Robert Martin named the Liskov Substitution Principle:
Functions that use pointers or references to base classes must be able to use objects of derived classes without knowing it.
If you had fly_to(Bird, location)
, then you should be able to call fly_to
on any subtype of Bird
. This means that Penguin
and Ostrich
couldn’t be subtypes of Bird
, since you cannot call fly_to
on a flightless bird!
Liskov would later “approve” of the rule in her book Program Development in Java.
The Devil is in the Details
The LSP looks simple; the problem is applying it. How do you check that a function works on all subclasses of a class? Functions can do a lot of things! So we have a set of enforceable rules, where if you follow all the rules, you (presumably) satisfy LSP. The most well known of these rules is that inherited methods of a class must not strengthen preconditions or weaken postconditions.1 That means it must accept a superset of the values the parent method accepts and output a subset of the parent’s possible outputs. Given
class Parent {
run(x: int): out {
require {x >= 4}
# some code that computes out
ensures {out % 2 == 0}
}
}
class Child extends Parent {
# stuff
}
Child.run
cannot be overloaded to take only even integers, or to also output odd numbers. Preconditions and postconditions together are called contracts.
This is just the first rule and already we have three problems. First, it’s not super clear why this follows from the LSP. Second, it’s hard to remember whether it’s weaken preconditions and strengthen postconditions or the other way around. Third, most languages don’t have dedicated pre/postcondition syntax. For these languages the rule is taught as a restriction on the method’s type signature: the parameters must be contravariant and the return values covariant. This makes things even more arbitrary and confusing and isn’t as accurate as the original version.
Long tangent on types and contracts
The above is historically inaccurate: the original Liskov paper lists both the contract rule and the type rule as separate requirements for subtyping. But IMHO the type rule is equivalent to the contract rule. Saying “parameters must be contravariant” is the same as saying “preconditions must not be strengthened.” Consider this case of contravariant parameters:
Parent.foo(x: Cat) {
# code
}
Child.foo(x: Animal) {
# code
}
We can rewrite it purely as contracts:
Parent.foo(x) {
requires typeof(x) == Cat;
# code
}
Child.foo(x) {
requires typeof(x) == Animal;
# code
}
It’s less intuitive that we can rewrite contracts as types, but that’s also possible:
foo(x: Int): out {
require {x >= 4}
# code
ensure {out % 2 == 0}
}
# becomes
foo(x: AtLeastFour): (out: EvenNumber) {
# code
}
It’s true that most languages can’t statically typecheck this, but that’s a limitation on our current typecheckers, not an essential problem. Some languages, like Liquid Haskell, handle it just fine.
That said, even if the two rules are theoretically equivalent, in practice your programming language will favor checking some things as contracts and other things as types.
We can’t ensure LSP with function contracts alone. Consider!2
@dataclass
class WrappedCounter:
val: int # assume that 0 <= val < 1000
def add(self, x):
assert x >= 0 # Precondition
self.val = (self.val + x) % 1000
class Counter(WrappedCounter):
def add(self, x):
assert x >= 0 # Precondition
self.val += x
Counter
is obviously not a subtype of WrappedCounter
, but it has the same method contracts! The difference is that WrappedCounter
has an extra invariant, a property that holds true of an object at all times. Counter
doesn’t have that property, so it can’t be a child class. That leads to rule two, subtypes must not weaken the class invariants.
The method rule and the invariant rule look somewhat related, but the next one comes out of left field.
@dataclass
class WrappedCounter:
val: Nat
limit: Nat
def add(self, x):
assert 0 <= x < self.limit # Precondition
self.val = (self.val + x) % self.limit
class EvilCounter(WrappedCounter):
def add(self, x):
assert 0 <= x < self.limit # Precondition
self.limit += x
self.val = (self.val + x) % self.limit
Let’s look at our two rules:
- Does it pass the method test? Yup, both have the exact same type signatures and preconditions.
- Does it pass the invariant test? Yup, they both guarantee that
self.val ∈ [0, limit)
.
But EvilCounter
is clearly not a subtype of WrappedCounter
. It doesn’t wrap!
We need a third rule, the history rule: the subtype cannot change in a way prohibited by the supertype. In this case, this means that limit
must remain unchanged. WrappedCounter
follows this rule and EvilCounter
violates it, so it’s not a subtyping relation.3
The history rule always felt like a tack-on. It makes me worry that we’re just patching up holes as we find them, and that somebody will come up with a counterexample that follows all three rules but still isn’t a subtype. Like what happens when we add in exceptions or generics? We might have to add another arbitrary rule. It all just makes me lose trust in the applicability of LSP.
The Fix
In their college classes, Elisa Baniassad and Alexander J Summers found a different approach:
For a class to be substitutable for its supertype, it must pass all the supertype’s black box tests.4
To check substitutability, we come up with a test:
def test_add_five():
c = WrappedCounter(10)
c.add(5)
assert c.val == 15
This passes, so we’d expect the equivalent test to pass for Counter
:
def test_add_five():
c = Counter(10)
c.add(5)
assert c.val == 15
This also passes. If every test we write passes, then Counter
is a subtype of WrappedCounter
.
So let’s write a second test:
def test_add_million():
c = WrappedCounter(10)
c.add(1_000_000)
assert c.val == 10
And this passes for WrappedCounter
but fails for Counter
. Therefore Counter
is not a subtype of WrappedCounter
.
Show limitation
Note that passing tests doesn’t guarantee you’ve got a subtype, since there might be a failing test you just haven’t considered writing. It’s similar to how, even if you have a rule violation, you might never actually run into a subtype violation in production. This is more a pedagogical technique to help people internalize LSP.
This explains a lot
I like how the testing approach to LSP explains all the rules. If you give me a case where an LSP rule violation causes a problem, I can give you back a test case which fails for the same reason. Let’s Pythonize my earlier example of a precondition rule violation:
class Parent:
def run(self, x: Int):
assert x >= 4
pass
# return something
class Child(Parent):
def run(self, x):
assert x % 2 == 0
super()
pass
Here’s my test:
def test_parent():
c = Parent()
c.run(5)
def test_child():
c = Child()
c.run(5)
We don’t need to do anything with the output of run
, just call it. The first test (with c = Parent()
) will do nothing while the second test (with c = Child()
) will throw an error, failing the test. Child
is not a subtype.
So that’s one way the testing approach makes things easier. It also shows why the history rule is necessary and also why it originally seemed like a crude hack to me.
def test_history_rule():
c = WrappedCounter(0, 10)
c.add(6)
c.add(6)
assert c.val == 2
We originally needed the history rule because contracts and invariants only span the lifetime of one method. But now we can just call a sequence of methods in one test!
Does this work?
By “work”, I mean “does it help students internalize the LSP.” I found the paper a little confusing here: it says they “observed that students were able to better understand the responsibilities of the subtype in a general sense”, but also that “the same proportion of students” understood the rules, which kinda sounds like ‘no impact’? I emailed the researchers for a clarification and here’s their response:
It was a big improvement. Students went from not demonstrating any intuition around the rule, to “getting it”. “Oh - subclass has to pass all superclass black box tests! I get it!”. And then you can delve into ways tests can break (narrowing preconditions, etc). But at the root of it all - it’s about the tests. They did way better on LSP questions, and I believe there was one question in particular that we asked before and after, and it was way better done (like from a failing grade to a good passing grade on average) with the testing approach.
They also told me about an indirect benefit: it also helped students understand the difference between black- and white-box tests! Once you get a sense for LSP in the original
It helped clearly delineate whether tests were about “the packaging” (like what’s written on the box) or the “how” you’ve done it (which is specific to the implementation, and can change if you make different choices). You have to test both, and make sure both the packaging is okay, and check that the underlying implementation is not having unwanted side effects, or introducing errors — and it then became clear to students why we needed both kinds of tests, and why differentiating was useful.
Pretty cool!
Thanks to Liz Denhup for feedback. If you liked this, come join my newsletter! I write new essays there every week.
- How did we go from functions to methods? Think of the method
obj.run(x: int)
as being syntactic sugar for the functionrun(obj: Class, x: int)
, kind of like how Python methods have a requiredself
parameter. [return] - If I wanted to actually guarantee that
val >= 0
, I could add a post_init method to the dataclass. [return] - In this specific case you can catch this violation with the postcondition
self.limit == old.limit
, but this is just a simple illustrative example. [return] - Where “black box” means “public methods and accesses only”. [return]