What We Know We Don't Know: Empirical Software Engineering

This version of the talk was given at DDD Europe, 2024.

Technology is a multitrillion dollar industry, but we know almost nothing about how it’s best practiced. Empirical Software Engineering, or ESE, is the study of what works in software and why. Instead of trusting our instincts we collect data, run studies, and peer-review our results. This talk is all about how we empirically find the facts in software and some of the challenges we face, concluding with a guide on how to find existing research and an overview on some things we’ve learned about DDD.

Slides are here.

Sources

I referenced a bunch of papers in my talk. These are links so you can read them yourself:

Intro

Big Data is slower than laptops: Scalability! But at what COST?

Why we care

Section references

The Pragmatics of TDD

TDD is dead. Long live testing.

Methods

Controlled Trials

Comparing syntax highlightings and their effects on code comprehension

Natural Experiments

Simple Testing Can Prevent Most Critical Failures

Natural Experiments Gone Rogue

A Large Scale Study of Programming Languages and Code Quality in Github (original faulty study)

On the Impact of Programming Languages on Code Quality (replication study)

My 6,000 word writeup of the whole fiasco

Observational Studies

Fixing Faults in C and Java Source Code: Abbreviated vs. Full-Word Identifier Names (preprint)

DDD

Survey Paper

Domain-Driven Design in Software Development: A Systematic Literature Review on Implementation, Challenges, and Effectiveness

Interesting Papers

Design, Monitoring, and Testing of Microservices Systems: The Practitioners’ Perspective

Practitioner Views on the Interrelation of Microservice APIs and Domain-Driven Design: A Grey Literature Study Based on Grounded Theory

Refactoring with domain-driven design in an industrial context

Tackling Consistency-related Design Challenges of Distributed Data-Intensive Systems - An Action Research Study

Note that there are other interesting papers in the survey paper, these are just the ones I brought up in the talk.

Additional Sources

Recommended Reading

Teaching tech together

Leprechauns of Software Engineering

The Programmer’s Brain

Making Software

Free research

It Will Never Work In Theory

ACM digital library

ArXiv

Questions

What does science say about DDD as a whole? Is it good or bad?

This isn’t a question science can answer. It’s kind of like asking “is competition good”: the scope is simply too vast and the criteria too ambiguous to have a meaningful answer.

Instead, we have to look at the specific things people do as part of DDD, and the specific ways it affects their projects. Do microservice architects applying bounded contexts create more services than ones who don’t use any part of DDD? Do domains modeled with event storming “look different” than domains that don’t use it? What are the most common unique issues in systems that use CQRS?

Are there any studies on how to teach “thinking in abstractions”?

Off the top of my head, the first place I’d look is at Shriram Krishnamurthi’s corpus. His group focuses on how we can teach abstraction better and has developed a lot of interesting tools exploring this.

Does it even make sense to study “which languages are more error-prone?” Maybe different languages attract different types of people, and that’s what matters.

In his video debunking the original paper, Jan Vitek agrees that this is a fundamental issue with the original paper, but he focused the replication on the methodological errors because those are easier to conclusively prove. See my writeup for more details.

How long is it usually between software developers adopting a new technique and scientists studying it?

No idea, sorry.