Test metrics give false security. Focus on your dangerous code first.

Jasper Sprengers
6 min readApr 12, 2021

TL;DR
Pushing for test coverage is a bad incentive. It favors quantity over quality. You should prioritize your code with regard to test-worthiness. I discuss three such categories. First there is code that processes unmanaged and possibly malicious (user)input. You should test this for robustness. Secondly there is complex business logic with possible implementation errors. Test this for correctness. Thirdly there are quirks with trusted frameworks running code that doesn’t perform at scale. Run performance tests under realistic production loads to capture these.

If you like to get serious with test automation there are lots of great tools to pack in your kit. Every language has their preferred frameworks and plugins for writing tests, running them and gathering metrics into fancy reports. Slick videos can give non-techies the impression that test automation is all about picking the right tools and they will test our code for us. Unfortunately, the tools only take care of the execution stage. We still need you to write the tests, but at least they can run on cheap CPU time instead of expensive people time.

Test tooling is great at churning out stats. The most popular metrics are coverage on file, method, line and branch level, in ascending order of usefulness. We also love to see trends develop over time — as long as they go up. Yet in your hearts we know that test coverage alone is not enough. It really only tells you which production code was touched by some test code and that it encountered no runtime errors. It’s not a good indicator of code fitness, adaptability, or any of the other *-ilities. Without assertions the behavior of the production code is not even validated. Without proper organization and meaningful names, the body of tests becomes painful to manage and loses its greatest benefit, which is to stand in as (part of) the specification.

Quantity over quality

Metrics-lovers favor quantity over quality. There’s the amusing and apocryphal tale of the Soviet television plant where every last week of the month factory workers would just bang in the screws with a hammer in order to meet their output targets. In the same vein bad development managers reward manual testers for the number of bugs they report, not the severity. What you get is a police force that only writes parking tickets.

There’s no reason per se why driving for high coverage should result in poor quality test code. However, the perverse incentive to do is real. If it’s team policy to have a unit test for every line of production code to keep coverage high, you are practicing what I would call orthodox TDD. Every line of code looks equally important from a testing standpoint, but that is not reality. I use every door in my house, but the bedroom doors don’t need a lock. Likewise, each line of code is necessary for the software to function, but not every line is a potential pitfall that you need to carefully safeguard with your tests. Good testers know this. They zone in on the hard stuff, the parts of the application that can cause havoc when you feed it unpredictable or malicious input. They know the parts that are test-worthy, for lack of a better word. How effective is this test in preventing regressions? What’s the risk of not having this test? What could conceivably go wrong? What’s the potential damage we’re trying to prevent? The answers to these questions will tell you how thoroughly to test, if at all. This is an intuitive exercise, so do not try to put a number to it.

It took some paragraphs to hopefully win you over to my point of view. Here’s three categories of code that are highly test-worthy in my not so humble opinion. It’s the kind of code that goes bump in the night and guarantees you a trip to hell and back when left untested. The price of peace is vigilance and JUnit.

One: code that processes unmanaged input

You have the sacred duty to trust no one when it comes to code that processes manual user input or reads from unmanaged files. By unmanaged I mean any data that is not part of the application source or written exclusively by it. You shouldn’t even trust database content that other applications have write-access to. Put on your misanthropic hat when testing. (REST)Controller endpoints are a typical place to perform such input validation. There are libraries to help you with standard cleanup work, so use them. For your own sanity, don’t write your own HTML sanitizer. Every piece of user input should be scrubbed and fumigated before it can enter the business logic of the software. This code category needs more than regular-expression validation of input parameters, so simple unit tests won’t cut it. I classify performance tests for malicious attacks and authentication/authorization (who can do what and where) under this header as well.

Two: Complex business rules

Business rules form the next category. The previous category is about hardening the code against bad intentions, robustness in other words. You test the unhappy paths, the scenarios that don’t happen often but cause a real mess when they do. This category of tests helps to minimize human error by validating codecorrectness. Complex business logic has a large margin for implementation errors. Small bugs are insidious. They produce the wrong output predictably, so you are not alerted by red flags and outages. Any financial calculation belongs to this category. “The output looks plausible” is little reassurance when your formula is only ten cents short, but a million times in a row. Unit tests are indispensable for this category. I once implemented an intricate floating point formula to convert between two geolocation units of measure. The hardcore math had already been done, but there was plenty of room to put a comma wrong. A unit-test with a round-trip conversion made sure we literally ended up in the same place.

Three: Abstractions that don’t scale well

Then there is the third category of leaky abstractions. This typical occurs when code utilizes frameworks without thinking about scale. A real-world example. A user form for a custom ticketing system was extended with a file-upload utility to attach multiple files to a ticket. Uploaded files were stored as blobs in an Oracle table. The data model of a ticket was represented in a Hibernate entity with a list field representing the file attachments. If you’re an old hand in ORM you will ask at this stage: “Please tell me they didn’t eagerly load those blobs”. Sorry, I’m afraid they did. When a user opened their list of tickets the contents of all attached files was loaded into memory, crashing the server even under a modest load.

Errors like the above are not easy to prevent. You can’t find them with a unit test. Even an integration test that uses the database won’t spot them unless you simulate a large enough load. Then again you would only write such a test if you already were aware of the danger, in which case you would not have implemented eager loading to begin with. Your best defense is having someone in the team who knows the platform pitfalls intimately and always run performance tests under a realistic load.

Conclusion

Solving a problem with code is always time-consuming and therefore expensive. It makes good business sense to focus your limited time on the three categories I just outlined. It makes no sense to have unit tests for every thin service layer component if all they do is delegate to other injected services. Use your intuition. You will know instinctively which code can cause big trouble. Direct your efforts there.

--

--

Jasper Sprengers

Writing about anything that delights and/or annoys me about the craft of software making.