Oct 5, 2014 / CODE COVERAGE, QUALITY

Your code coverage metric is not meaningful

Last week, I had a heated but interesting Twitter debate about Code Coverage with my long-time friend (and sometimes squash partner) Freddy Mallet.

The essence of my point is the following: the Code Coverage metric that most quality-conscious software engineers cherish doesn’t guarantee anything. Thus, achieving 80% (or 100%) Code Coverage and bragging about it is just as useful as blowing in the wind. For sure, it’s quite hard to have a fact-based debate over Twitter, as 140 chars put a hard limit on any argument. This article is an attempt at writing down my arguments in a limitless space.

The uselessness of raw Code Coverage can be proved quite easily. Let’s have a simple example with the following to-be-tested class:

public class PassFilter {

    private int limit;

    public PassFilter(int limit) {
        this.limit = limit;
    }

    public boolean filter(int i) {
        return i < limit;
    }
}

This class is quite straightforward, there’s no need to comment. A possible test would be the following:

public class PassFilterTest {

    private PassFilter passFilterFive;

    @BeforeMethod
    protected void setUp() {
        passFilterFive = new PassFilter(5);
    }

    @Test
    public void should_pass_when_filtering_one() {
        boolean result = passFilterFive.filter(1);
    }

    @Test
    public void should_not_pass_when_filtering_ten() {
        boolean result = passFilterFive.filter(10);
    }
}

This test class will happily return 100% code coverage as well as 100% line coverage: executing the test will go through all the code’s lines and on both sides of its single branch. Isn’t life sweet? Too bad there are no assertions; they could have been "forgotten" on purpose by a contractor who couldn’t achieve the previously agreed-on code coverage metric. Let’s give the contractor the benefit of the doubt, and assume programmers are of good faith - and put assertions:

public class PassFilterTest {

    private PassFilter passFilterFive;

    @BeforeMethod
    protected void setUp() {
        passFilterFive = new PassFilter(5);
    }

    @Test
    public void should_pass_when_filtering_one() {
        boolean result = passFilterFive.filter(1);
        Assert.assertTrue(result);
    }

    @Test
    public void should_not_pass_when_filtering_ten() {
        boolean result = passFilterFive.filter(10);
        Assert.assertFalse(result);
    }
}

Still 100% code coverage and 100% line coverage - and this time, "real" assertions! But it still is no use… It is a well-know fact that developers tend to test for passing cases. In this case, the two cases use parameters of 1 and 10, while the potential bug is at the exact threshold of 5 (should the filter let it pass or not with this value?).

In conclusion, the raw Code Coverage only guarantees the maximum possible Code Coverage. If it’s 0%, of course, it will be 0%; however, with 80%, it gives you nothing… just the insurance that at most, your code is covered at 80%. But it can also be anything in between, 60%, 40%… or even 0%. What good is a metric that only hints at the maximum? In fact, in this light, Code Coverage is a Bloom Filter. IMHO, the only way to guarantee that test cases and the associated test coverage are really meaningful is to use Mutation Testing. For a basic introduction to Mutation Testing, please check my Introduction to Mutation Testing talk at JavaZone (10 minutes).

The good thing is that Mutation Testing is not some kind of academic paper only known by nerdy scientists, there tools in Java to start using it right now. Configuring PIT with the previous test will yield the following result:

PIT report

This report pinpoints the remaining mutant to be killed and its associated line (line 12), and it’s easy to add the missing test case:

@Test
public void should_not_pass_when_filtering_five() {
    boolean result = passFilterFive.filter(5);
    Assert.assertFalse(result);
}

Now that we’ve demonstrated without doubt the value of Mutation Testing, what do we do? Some might tried a couple of arguments against Mutation Testing. Let’s have a review of each of them:

Mutation Testing takes a long time to be executed. For sure, the combination of all possible mutations takes much more longer than standard Unit Testing - which has to be be under 10 minutes. However, this is not a problem as Code Coverage is not a metric that has to be checked at each build: a nightly build is more than enough.
Mutation Testing takes a long time to analyze results. Also right… but what of the time to analyze Code Coverage results that, as seen before, only hint at the maximum possible Code Coverage?

On one hand, the raw Code Coverage metric is only relevant when too low - and requires further analysis when high. On the other hand, Mutation Testing lets you have confidence in the Code Coverage metric at no additional cost. The rest is up to you…

Follow me Follow me

Your code coverage metric is not meaningful

You shouldn't follow rules... blindly

Throwing a NullPointerException... or not