Integrated Tests are a Scam: Part 1
On March 1, 2010 I changed the phrase “integration tests” to “integrated tests” in this article.
Integrated tests are a scam—a self-replicating virus that threatens to infect your code base, your project, and your team with endless pain and suffering.
I mean it. I hate integrated tests. I hate them, and with a passion. Of course, I should clarify what I mean by integrated tests, because, like any term in software, we probably don’t agree on a meaning for it.
I use the term integrated test to mean any test whose result (pass or fail) depends on the correctness of the implementation of more than one piece of non-trivial behavior.
I, too, would prefer a more rigorous definition, but this one works well for most code bases most of the time. I have a simple point: I generally don’t want to rely on tests that might fail for a variety of reasons. Those tests create more problems than they solve.
You write integrated tests because you can’t write perfect unit tests. You know this problem: all your unit tests pass, but someone finds a defect anyway. Sometimes you can explain this by finding an obvious unit test you simply missed, but sometimes you can’t. In those cases, you decide you need to write an integrated test to make sure that all the production implementations you use in the broken code path now work correctly together.
So far, no big deal, but you’ll meet the monster as soon as you think this:
If we can find defects even when our tests pass 100%, and if I can only plug the hole with an integrated tests, then we’d better write integrated tests everywhere.
Bad idea. Really bad.
Why so bad? A little bit of simple arithmetic should help explain.
You have a medium-sized web application with around 20 pages, maybe 10 of which have forms. Each form has an average of 5 fields and the average field needs 3 tests to verify thoroughly. Your architecture has about 10 layers, including web presentation widgets, web presentation pages, abstract presentation, an HTTP bridge to your service API, controllers, transaction scripts, abstract data repositories, data repository implementations, SQL statement mapping, SQL execution, and application configuration. A typical request/response cycle creates a stack trace 30 frames deep, some of which you wrote, and some of which you’ve taken off the shelf from a wide variety of open source and commercial packages. How many tests do you need to test this application thoroughly?
At least 10,000. Maybe a million. One million.
Wie ist es möglich?! Consider 10 layers with 3 potential branch points at each layer. Number of code paths: 310 > 59,000. How about 4 branch points per layer? 410 > 1,000,000. How about 3 branch and 12 layers? 312 > 530,000.
Even if one of your 12 layers has a single code path, 311 > 177,000.
Even if your 10-layer application has only an average of 3.5 code paths per layer, 3.510 > 275,0001.
To simplify the arithmetic, suppose you need only 100,000 integrated tests to cover your application. Integrated tests typically touch the file system or a network connection, meaning that they run on average at a rate of no more than 50 tests per second. Your 100,000-test integrated test suite executes in 2000 seconds or 34 minutes. That means that you execute your entire test suite only when you feel ready to check in. Some teams let their continuous build execute those tests, and hope for the best, wasting valuable time when the build fails and they need to backtrack an hour.
How long do you need to write 100,000 tests? If it takes 10 minutes to write each test—that includes thinking time, time futzing around with the test to make it pass the first time, and time maintaining your test database, test web server, test application server, and so on—then you need 2,778 six-hour human-days (or pair-days if you program in pairs). That works out to 556 five-day human-weeks (or pair-weeks).
Even if I overestimate by a factor of five, you still need two full-time integrated test writers for a one-year project and a steady enough flow of work to keep them busy six hours per day and you can’t get any of it wrong, because you have no time to rewrite those tests.
No. You’ll have those integrated test writers writing production code by week eight.
Since you won’t write all those tests, you’ll write the tests you can. You’ll write the happy path tests and a few error cases. You won’t check all ten fields in a form. You won’t check what happens on February 29. You’ll jam in a database change rather than copy and paste the 70 tests you need to check it thoroughly. You’ll write around 50 tests per week, which translates to 2,500 tests in a one-year project. Not 100,000.
2.5% of the number you need to test your application thoroughly.
Even if you wrote the most important 2.5%, recognizing the nearly endless duplication in the full complement of tests, you’d cover somewhere between 10% and 80% of your code paths, and you’ll have no idea whether you got closer to 10% or 80% until your customers start pounding the first release.
Do you feel lucky? Well, do you?2
So you write your 2,500 integrated tests. Perhaps you even write 5,000 of them. When your customer finds a defect, how will you fix it? Yes: with another handful of integrated tests. The more integrated tests you write, the more of a false sense of security you feel. (Remember, you just increased your code path coverage from 5% to 5.01% with those ten integrated tests.) This false sense of security helps you feel good about releasing more undertested code to your customers, which means they find more defects, which you fix with yet more integrated tests. Over time your code path coverage decreases because the complexity of your code base grows more quickly than your capacity to write enough integrated tests to cover it.
…and you wonder why you spend 70% of your time with support calls?
Integrated tests are a scam. Unreliable, self-replicating time-wasters. They have to go.
1 True: few code bases distribute their complexity to their layers uniformly. Suppose half your 12 layers have only two branch points—one normal path and one error path—while the others have 5 branch points. 26·56 = 1,000,000 and for 4 branch points 26·46 > 262,000. You can’t win this game.
2 Aslak Hellesøy points to a way to take luck mostly out of the equation. His technique for choosing high-value tests will certainly help, but it stops short of testing your code thoroughly. I believe you can achieve truly thorough focused tests with similar cost to writing and maintaining integrated tests even using the pairwise test selection technique. (Thanks, Aslak, for your comment on April 12, 2009.)