Adventures in Coding: Cyclomatic Complexity

If you were asked why the goto statement is considered harmful, what would you say? Our gut feeling tells us it’s bad, it’s not easy to follow code that has goto statements in it, it creates spaghetti code, it just has a bad vibe all around. But why is it really bad? What’s the mathematical reason behind it?

This blog is not for QA or testers, this is for developers who want to write quality code that is readable, maintainable and testable with good coverage. The number of software defects and maintenance cost are directly related to the complexity of an application.

Today we will talk about quantitative ways to reduce complexity at the function level and try to understand why it works. Please note that this blog is not about how to write tests, but about how to write code to facilitate testing with adequate coverage.

What is Cyclomatic Complexity?

Cyclomatic Complexity (henceforth CC or CCN) is a software metric to indicate the level of logical complexity of a program (or function). It measures the number of linearly independent paths through the code. The concept originates in graph theory and is applied to the control flow graph of a program. There are three ways to measure CC, the simplest way is to count the number of if, while, for, switch-case and try-catch statements in a function. Check here and here for details.

Why is it important? When we want to test a method we have a few options at our disposal on choosing test case coverage. Some of them are:
Line Coverage: Every statement will be executed at least once by our choice of test cases.
Branch Coverage (DD-Paths): Each branch condition will be executed for both true and false values.
Basis Path Coverage: Covers all linearly independent paths through the function. Basis paths are a set of linearly independent paths.

Essentially what it means is, any other path in the control flow graph of the function can be obtained by a linear combination of paths from the basis set. Basis Path Testing identifies test cases needed to ensure:

Every statement will be executed at least once (line coverage).
Each branch condition will be executed for both true and false values (branch coverage).

The Cyclomatic Complexity number of a function tells us what the size of the Basis Path Set should be. Reducing the Cyclomatic Complexity of functions minimizes program defects and testing efforts, and maximizes readability and maintainability. Using CCN to estimate test coverage eliminates redundant test cases, and provides adequate test coverage. Complete path coverage, also known as NPath complexity, is extremely difficult and resource intensive, even for a simple function. Given these challenges, the goal is to select a set of test cases, a basis set, that will most likely identify as many defects as possible (not all, but many).

public int ReturnInput(bool foo, bool bar, bool baz, int x) {
    if (foo) {
        x++;
    }
    if (bar) {
        x--;
    }
    if (baz) {
        x = x;
    }
    return x;
}

Here are some different Test Cases for this function (Assuming : T = if (true), F = if (false))

Line Coverage (100% lines): TTT
Branch Coverage (100% branches): TTT, FFF
Randomly selected Path Coverage: TTT, FFF, FFT, TTF
Basis Path Coverage: TTT, FTT, TFT, TTF
NPath Coverage (full coverage): TTT, FTT, TFT, TTF, FFF, FFT, TFF, FTF

There’s a defect in the above function. If foo is true and bar is false, it fails to return input. Notice that neither Line coverage nor Branch coverage finds the defect. Even a Random selection of 4 test cases fails to find the defect. The reason the random selection of paths fails to find it is because those 4 cases are not linearly independent (for eg, FFF = FFT + TTF – TTT). However, Basis Path Coverage with 4 basis test cases does find it. The NPath coverage will find the defect as well, but we would have to run a lot more test cases (8 instead of 4). Writing test cases to cover all paths (NPath complexity) is practically impossible in most situations. Instead, our focus is going to be on covering linearly independent paths (basis set).

Therein lies the beauty of Cyclomatic Complexity. It tells you the minimum number of test cases needed to adequately cover your function. However, it doesn’t tell you what those linearly independent paths are.

How to determine a Basis Set of linearly independent paths

Pick any path (eg, TTT)
Flip each condition to its opposite, one at a time, to create new paths (eg, FTT, TFT, TTF).
Stop when you have reached the cyclomatic complexity number (eg 4).

The relationship between the different coverages is as follows:

line coverage ≤ branch coverage ≤ cyclomatic complexity ≤ total number of paths (NPath complexity)

A More Elaborate Example

Now, let’s analyse a more elaborate example, with a defect in it. Imagine a function that tells you if the King of chess can move up, down, left or right (no capturing of pieces or diagonal movements for simplicity).

// Cyclomatic complexity: 5; NPath complexity: 16
public void SetSquareForPiece(Piece piece, Board board, int x, int y) {
    int k = 0;
    if (board.GetSquareValue(x + 1, y) == "EMPTY") {         // 1
        piece.CanMoveRight = true;
    } else {                                                 // 2
        piece.CanMoveRight = false;
    }
    if (board.GetSquareValue(x - 1, y) == "EMPTY") {         // 3
        piece.CanMoveLeft = true;
        x = x + 1;
    } else {                                                 // 4
        piece.CanMoveLeft = false;
        k = k + 1;
    }
    if (board.GetSquareValue(x, y + 1) == "EMPTY") {         // 5
        piece.CanMoveUp = true;
        x = x + k;
    } else {                                                 // 6
        piece.CanMoveUp = false;
    }
    if (board.GetSquareValue(x - 1, y - 1) == "EMPTY") {     // 7   (bug)
        piece.CanMoveDown = true;
    } else {                                                 // 8
        piece.CanMoveDown = false;
    }
}

Assuming: T = if (true), F = if (false)

Branch Coverage (100% branches): TTFF, FFTT
Basis Path Coverage (cyclomatic complexity: 5) TTTT, FTTT, TFTT, TTFT, TTTF
NPath Coverage: 2^4 = 16 TTTT, TFTT, TTFT, TTTF, TFFT, TTFF, TFFF, TFTF, FTTT, FFFF, FFTT, etc…

In order to obtain full coverage for all possible path combinations, 16 unit tests are required for this function (NPath). However, we can get adequate coverage with Basis Path Testing with just 5 unit tests (CCN). But this is only a 31% path coverage (5/16). It’s important to note that Basis Path Coverage (5 tests) in this particular case does not find the defect. It’s because the defect in the last if-else block is masked by state changes in the other blocks.

What can we do to improve our chances of finding the defect? Let’s refactor the above function into four smaller functions (keeping the bug as is), a technique known as Extract Method:

public void SetPossibleSquare(Piece piece, Board board, int x, int y) {
    SetSquareRight(piece, board, x, y);
    SetSquareLeft(piece, board, x, y);
    SetSquareUp(piece, board, x, y);
    SetSquareDown(piece, board, x, y);
}
// CCN: 2; NPath: 2
public void SetSquareRight(Piece piece, Board board, int x, int y) {
    if (board.GetSquareValue(x + 1, y) == "EMPTY") {                      // 1
        piece.CanMoveRight = true;                                  
    } else {                                                              // 2
       piece.CanMoveRight = false;                                  
    }
}
// CCN: 2; NPath: 2
public void SetSquareLeft(Piece piece, Board board, int x, int y) {
    int k = 0;
    if (board.GetSquareValue(x - 1, y) == "EMPTY") {                      // 3
        piece.CanMoveLeft = true;
        x = x + 1;
    } else {                                                              // 4
        piece.CanMoveLeft = false;
        k = k + 1;                               
    }
}
// CCN: 2; NPath: 2
public void SetSquareUp(Piece piece, Board board, int x, int y) {
    int k = 0;
    if (board.GetSquareValue(x, y + 1) == "EMPTY") {                      // 5
        piece.CanMoveUp = true;
        x = x + k;
    } else {                                                              // 6
        piece.CanMoveUp = false;
    }
}
// CCN: 2; NPath: 2
public void SetSquareDown(Piece piece, Board board, int x, int y) {
    if (board.GetSquareValue(x - 1, y - 1) == "EMPTY") {                  // 7  (bug)
        piece.CanMoveDown = true;
    } else {                                                              // 8
        piece.CanMoveDown = false;
    }
}

Cyclomatic complexity of each method: 2 T and F test cases for each function
NPath complexity of each method: 2 T and F paths for each function

Notice that we increased the total cyclomatic complexity of the methods from 5 to 8 after the refactor, but reduced the total NPath complexity from 16 to 8 (2 * 4), which now gives us a test coverage of 100% (8/8).
More importantly, now our tests also detect the bug at line #7 (with test case F), because this logical block is now an independent function and is not corrupted by state changes in other blocks.

Martin Fowler’s book Refactoring and Steve McConnell’s book Code Complete are two of the best sources of information on how to reduce program complexity and increase test coverage. Fowler advocates aggressive use of the Extract Method technique to keep the complexity very low. McConnell provides a list of reasons in his book to keep functions small.

Summary

Basis Path Coverage based on Cyclomatic Complexity number provides better test coverage than either Line or Branch coverage.
If a function has a high NPath complexity (all path combinations), then even a low cyclomatic complexity number may miss some defects in the function. We want our Basis Set of test cases to be as close to NPath complexity as possible for each function. Extract Method refactoring allows us to achieve that.
Extract Method refactoring may increase the combined CCN for all extracted methods, but it also reduces the NPath complexity (full path complexity) of each method, and so the CC number becomes closer and closer to the NPath complexity of each method, and test coverage approaches 100% with smaller and smaller methods.
In our Chess board example, we needed 8 unit tests for full coverage after Method Extraction, rather than 16 unit tests for full coverage before Method Extraction.
Even though the recommendation is to keep the CC number below 10 for a method, I would go as far as keeping it below 5. As we saw in the last example, even with a CCN of 5 for a single function we couldn't detect the bug in our code.
Reducing the cyclomatic complexity of a function makes the code more readable, maintainable and less error prone to future refactoring. Small functions also adhere to the Single Responsibility Principle and are highly cohesive.

Essential complexity is the cyclomatic complexity of the reduced CFG (control flow graph) after iteratively replacing (reducing) all structured programming control structures in a function. All structured functions have an essential complexity of 1. An unstructured function, such as one that contains goto statements, is not always reducible to a cyclomatic complexity of 1. This makes it very hard to test and reason about those functions. That’s why goto statements are harmful. They lead to irreducible functions with essential complexity greater than 1, which makes them hard to reason about when used in code.

References

Structured Testing using Cyclomatic Complexity, McCabe
Basis Path Testing
Reasons to Create a Routine (Steve McConnell, Code Complete)
Refactoring: Improving the Design of Existing Code, Martin Fowler
Path Testing
McCabe Cyclomatic Complexity, expert opinions (7 min video)
Go To Statement Considered Harmful, Dijkstra, 1968

Adventures in Coding

Sunday, March 27, 2016

Cyclomatic Complexity

What is Cyclomatic Complexity?

How to determine a Basis Set of linearly independent paths

A More Elaborate Example

Summary

No comments:

Post a Comment