3 Test Design Principles for Continuous Integration
Article originally published at Tech Beacon
If your test case is causing more harm than good, is it truly useful? In the times of legacy software delivery, with long lead-times and difficulty changing the product once shipped, nearly all test cases (automated or not) were good test cases. In the times of continuous delivery, though, this calculus has shifted. It’s shockingly easy to end up with a software testing solution that inadvertently causes your software to be less stable – whether that’s by building false confidence in the code, by removing trust from the tests themselves, or by taking up so much time that the tests aren’t run frequently enough.
Whether automated or manual, it’s critical that any software checks that you make to validate your assumptions about code follow they key principles to ensure that they are fully compatible with a continuous integration and delivery system.
Principle One: Reliability
Any test case that you’re going to be running with any frequency must be reliable; that is, the test case cannot be flaky. Consider an automated check: in a continuous integration environment, this test case could run dozens or hundreds of times a day for a single team. If a test is only 99% reliable (one false report in every hundred test executions), and you run it two hundred times a day, then your team will be investigating false positive failures at least twice daily. Multiply that by a unit test suite that can have tens of thousands of test cases and the math becomes clear.
A test case that is not at least 99.9% reliable should be removed until it can be brought above the reliability threshold.
But what does reliability look like? A test case must take every precaution to avoid a false-negative or false-positive. It must be repeatable without outside human intervention – it must clean up after itself. In a fully-automated system there generally isn’t time for a human to, for example, drop tables on the SQL database every few test runs. Even a manually run test case must clean up after itself, because it is an unmanageable mental burden on the test executor to have to a continuously shifting starting-state.
Why is reliability so important? When developers are routinely wasting time investigating false positives or negatives, they quickly lose faith in the automation solution and are likely to ignore real failures alongside the false ones.
Principle Two: Importance
In a continuous integration system, the most precious resource you have to spend is engineer time. Engineers have grown to expect results quickly – and they are unwilling to wait for something they perceive as wasting time—so ensure that you’re getting relevant results back as quickly as possible. For example, there’s no point attempting to run unit tests on code that doesn’t compile. And there’s no point in running an API-level integration test suite if the unit tests on an underlying package don’t pass. You're assured that the code under test will have to change, so why waste time on a test run that is guaranteed to be thrown away?
Always run the most important test cases as quickly as possible, and always run your fastest tests first. Those are nearly always your unit tests – a typical unit test executes in microseconds and can generally be run in parallel. In my continuous integration systems, I can usually process through tens of thousands of unit tests in around ninety seconds.
An integration test is a test that crosses boundaries – usually including at least an HTTP or other machine-to-machine boundary. By definition, these test cases execute in milliseconds and are several orders of magnitude slower than are unit tests. Finally, a specialty test is anything that’s significantly slower than an integration test (like an end-to-end automated UI test) or that requires human intervention or interpretation that slows down the overall reporting of results.
While tests slower than a unit test certainly have value, and absolutely have a place in a continuous integration system, that place is after the faster and more reliable tests have run.
Principle Three: Specificity
Every test case should be a clear answer to a clear question, the combination of which add up to a test suite that will give a clear answer about the full set of functionalities under test. Clearly named, atomic test cases should provide ease in locating the potential cause of a test failure, and also ease in understanding what has and what has not been tested at a glance. When test cases are clear and atomic, it becomes easy to find coverage overlaps and thus candidates for deletion.
Putting the Principles Into Practice
Test cases that have many complicated steps and validations are prone to failure (violating principle number 1) and to have a long runtime (violating principle number 2). Consider the following test case:
Ensure Safe Search=Strict Works
Create a new use
Log that user into the UI
Navigate to the User Setting page
Change Safe Search setting to Strict
Navigate to the search page
Search for adult content
See that no results are returned
By running through everything end-to-end you are inadvertently testing a lot of functionality in order to answer the question the test case is actually posing. Rather than running through the entire experience exactly as an end-user would, this test case would be better served if you decompose it into several different cases – and several different suites:
A - User Creation Suite
B - User Setting Suite
C - Search Suite <-- Our test case goes here
Most likely all of A, B, and C will have a combination of 100x of unit tests and 10x of integration tests, depending on their specific system architecture boundaries. While C may require a logged in user, the purpose of the suite is not to ensure that you can create a user or update the user’s settings. Creation and setting changes are incidental functionalities that are likely called during test suite setup, such that if they fail then do not attempt to test any additional functionalities. Given this known order of precedence, you also want to ensure that test suites are run in A, B, C order as you know that C is dependent on functionality in A and B, and thus it’s useless to try and execute if either of those are known to not work.