Complexity is bad. Simple software is better than complex software.
But software is complex for a reason. While people like coming up with grand theories of complexity (Simple Made Easy, No Silver Bullet) there’s very little info out there on the nitty-gritty specific sources of complexity. Without that, all the theories feel to me like the four elements theory. We just don’t have the data needed to come up with something more predictive. 1
I think a lot about the different sources of complexity. This article is about one particular source.
A constraint is any measurable quality of the system that you have as a goal. The canonical constraint is performance: if your software runs too slow, you need to fix your software. Anything measurable can be a potential constraint:
- How much power your mobile app needs.
- API requests you make per hour.
- Compile time.
- Data transferred per page load.
- Amount of screen space taken up by toolbars.
Every constraint has variations, depending on what specifically we’re trying to measure. Does performance mean latency or throughput? Worst-case or average-case performance? Single-threaded or parallel performance? Or we could be constraining the variation of a metric, like guaranteeing constant-time operations.
Constraints are also called non-functional requirements.
Why Constraints Lead To Complexity
I’m assuming the constraint is performance, but these reasons should apply to all other constraints too.
Complexity is a Choice
There are more complex programs than simple ones, and so more complex programs that will work with your constraints.
I think this is “obvious”, so much so that it’s hard for me to step out of that mindset and support it from first principles. The best I can do is give an example. Take sorting: Insertion sort is very simple to understand and has a simple implementation that’s easy to prove correct. If you didn’t have any performance constraints, you’d be fine sorting with insertion sort. But it also has O(n²) worst case performance. Mergesort is considerably more complex, but has O(nlogn) performance. You can choose to make your program more complex to better match your performance constraints.
If you needed even more performance, you’d instead use something like Timsort. Insertion sort is faster than mergesort for very small lists, so you can get more performance by starting with mergesort and switching to insertion sort when it’d be more appropriate. That’d be more complex than using either algorithm 100% of the time, as you now need code to coordinate them.2
I think of simplicity as a sort of resource: by making a program more complex you get the flexibility to buy other things, like lower memory use, more robustness, better logging, etc. While it’s possible to be “frugal” and get better constraint-metrics while keeping the program simple, more often you have to “pay the price” with more complexity.
Multiple Constraints Cause More Complexity
Having two constraints is generally more than twice as bad as having one constraint because they interfere with each other. Space-Time Tradeoffs are when improving performance of an algorithm takes more memory/disk space and vice versa. Similarly, improving database read-throughput often comes at the cost of write-throughput. Finding the middle ground, or shifting priorities in different contexts, adds further complexity.
Even when constraints don’t affect each other, every new one has to build off the complexity of the code handling prior constraints.
Constraints can break modularity
If you only need to optimize one hot spot in your code, you can contain how messy you have to make the code. But after a point if you need to speed up a program by 10%, you get there by finding ten 1% optimizations. Lots of good coding practices, like indirection and DRY, add small amounts of overhead that add up over a whole program. Sometimes those 1% optimizations cutting out those bits of overhead. You see this a lot in compilers and JITs, where people will couple anything and everything if it leads to 1% performance improvements. Higher-stack programs aren’t constraint-bound to the same extent, but this kind of modularity-breaking still happens sometimes.
Constraints also punch holes in abstractions. Normally we think of programs independently of the hardware we run them on; constraints can forcibly couple the two. According to data-oriented design, you need to keep related data close in memory to best use the CPU caches.
Other notes on constraints
Hard and Soft Limits
Some systems are “more-constrained” than others, in that they have a limit. So we can further categorize constraints based on what happens when you exceed the limit.
- Soft limits can be exceeded by a small amount, or exceeded infrequently. For example, you might guarantee clients that you’ll send order confirmation emails within 10 minutes. If it occasionally takes you 15, it’s not great, but it’s not the end of the world either.
- Hard limits cannot be exceeded, period. If your Arduino has 256kb space then your compiled binary better be smaller than that.
Hard and soft limits come from Hard and Soft Realtime Systems but I think they have at least some applicability to other constraints. I’d guess hard constraints force more complexity than soft ones by dramatically reducing your flexibility— you can’t think of performance as trading off with simplicity, it’s the requirement, period.
Sometimes improving a constraint metric changes how people use the system, which changes what you need to support, which then adds more complexity.
I once led a project to improve a data warehouse. After a month of work we got the data science batch jobs from taking a day to running in a couple of seconds. Another team saw how fast it was and started using the warehouse to deliver on-demand dashboards to all of our customers. This was a totally unexpected use case that took a bunch more work, and a bunch more complexity, to properly support.
Further breaking down constraints
I believe that all constraints pressure complexity for the same broad reasons, but we can break them down even further. Statistical constraints like uptime and tail latency are different from memory use. Uptime also depends on a lot more factors than memory use, many of which aren’t in the programmer’s control.
Constraints cause a different kind of complexity than other sources do, like impedance or backwards compatibility. By categorizing complexity by source, we can talk about it in a richer than we could with just the grand unifying theories. I’m hoping this post convinces other people to try writing about sources of complexity too: I’ve got a pretty narrow window of the software world and can’t come up with many on my own.
- And don’t get me started on measurements like cyclomatic complexity, which try to reduce complexity to an easily measurable— and easily gameable— number. [return]
- Sometimes you get an insight that makes a program both simpler and faster, but that’s rare. And even then, you usually have to add complexity if you want to make it faster than that. [return]