What We Know We Don't Know: Empirical Software Engineering
This version of the talk was given at DDD Europe, 2024.
Technology is a multitrillion dollar industry, but we know almost nothing about how it’s best practiced. Empirical Software Engineering, or ESE, is the study of what works in software and why. Instead of trusting our instincts we collect data, run studies, and peer-review our results. This talk is all about how we empirically find the facts in software and some of the challenges we face, concluding with a guide on how to find existing research and an overview on some things we’ve learned about DDD.
Slides are here.
Sources
I referenced a bunch of papers in my talk. These are links so you can read them yourself:
Intro
- Big Data is slower than laptops
Why we care
- Section references
Methods
- Controlled Trials
Comparing syntax highlightings and their effects on code comprehension
- Natural Experiments
- Natural Experiments Gone Rogue
A Large Scale Study of Programming Languages and Code Quality in Github (original faulty study)
On the Impact of Programming Languages on Code Quality (replication study)
- Observational Studies
Fixing Faults in C and Java Source Code: Abbreviated vs. Full-Word Identifier Names (preprint)
DDD
- Survey Paper
- Interesting Papers
Design, Monitoring, and Testing of Microservices Systems: The Practitioners’ Perspective
Refactoring with domain-driven design in an industrial context
Note that there are other interesting papers in the survey paper, these are just the ones I brought up in the talk.
Additional Sources
- Recommended Reading
- Free research
Questions
What does science say about DDD as a whole? Is it good or bad?
This isn’t a question science can answer. It’s kind of like asking “is competition good”: the scope is simply too vast and the criteria too ambiguous to have a meaningful answer.
Instead, we have to look at the specific things people do as part of DDD, and the specific ways it affects their projects. Do microservice architects applying bounded contexts create more services than ones who don’t use any part of DDD? Do domains modeled with event storming “look different” than domains that don’t use it? What are the most common unique issues in systems that use CQRS?
Are there any studies on how to teach “thinking in abstractions”?
Off the top of my head, the first place I’d look is at Shriram Krishnamurthi’s corpus. His group focuses on how we can teach abstraction better and has developed a lot of interesting tools exploring this.
Does it even make sense to study “which languages are more error-prone?” Maybe different languages attract different types of people, and that’s what matters.
In his video debunking the original paper, Jan Vitek agrees that this is a fundamental issue with the original paper, but he focused the replication on the methodological errors because those are easier to conclusively prove. See my writeup for more details.
How long is it usually between software developers adopting a new technique and scientists studying it?
No idea, sorry.