Testing with F# (2015)
Chapter 10. The Ten Commandments of Test Automation
In this last chapter, we're going to look at some of the learning we have found along the way, condensed in a format of Ten Commandments. As it is easier to learn high-quality coding looking at anti-patterns on what you should not do, in the same way it is easier to start writing good tests by telling what you shouldn't do. By adding restrictions on testing, you'll find your tests becoming purer and you will start writing test suites that are easier to maintain and provide more value.
Testing behavior, not implementation
// don't
[<Test>]
let ``should hash user password with SHA1`` () =
() // test body
// do
[<Test>]
let ``should hash user password to make it unreadable`` () =
()
Unless hash algorithm is an explicit requirement, it should be considered an implementation detail.
You should never go down to a level of abstraction where your test expresses what the system should do. Instead, you should always test on what the feature expects of the system. The difference is that a test focusing on implementation will break on refactoring, whereas a test that focuses on behavior will not. Safe refactoring is one major part to why we're writing tests in the first place; so, we should really try to support this as much as we can.
This means that we don't explicitly test private methods, constructors, initializers, or patterns, unless they are backed up by a user story.
Using ubiquitous language in your test name
Let us take a look at the following code:
// don't
[<Test>]
let ``store transaction to database`` () =
()
// do
[<Test>]
let ``when customer checkout the cart, the order is persisted`` () =
()
Use words and expressions from the domain when expressing test names.
Getting the naming right is one of the hardest parts for the new tester. Before you know how to name your test, you need to know what to test, and before you know what to test you need to know what the feature is all about.
A good test name should state something about the feature, so obvious that a business analyst would agree with it. The test name should also reflect on what we're asserting.
In order to achieve this, I usually start my test names with the should word, and let the fixture hold the name of the feature as follows:
· when submitting the form.should warn about invalid e-mail address
· when submitting the form.should store the user profile to database
The first part is the name of the feature we're testing and the last part is the name of the test. When it comes to testing for exceptions I usually exchange the should word for cannot but keep the format.
Asserting only one thing in your test
Let us take a look at the following code:
// don't
[<Test>]
let ``username and password cannot be empty string`` () =
// arrange
let credentials = Credentials(System.String.Empty, System.String.Empty)
// act
let result = validate(credentials)
// assert
result |> should contain UserNameEmpty
result |> should contain PasswordEmpty
// do
[<Test>]
let ``username cannot be empty string`` () =
// arrange
let credentials = Credentials(System.String.Empty, "secret")
// act
let result = validate(credentials)
// assert
result |> should contain UserNameEmpty
// do
[<Test>]
let ``password cannot be empty string`` () =
// arrange
let credentials = Credentials("user", System.String.Empty)
// act
let result = validate(credentials)
// assert
result |> should contain PasswordEmpty
Cohesion in your code means that the function will do one and only one thing. This is easier when it comes to functional programming languages as you have the whole pure functional pattern and one function should always yield the same result for the same input.
When it comes to tests, cohesion also applies. One test should only test one thing. The cases where this rule is broken are where you have several asserts in the end. The use of asserting more than one thing is often a case of bad cohesion.
It is very important to have high cohesion in your test so that you know why the test failed. When you have several asserts in the same test and the test fails, you don't know from a glance what assert made the test break. Instead, you're looking at the name of the test to determine what functionality broke.
This comes down to good naming and good scoping of your tests. A more generic name of your test will invite several asserts to verify that the test passes. A more specific naming convention will only require one assert to verify the test outcome. With this, we should have narrow test targets with good naming conventions that will produce only one assert at the end of the test.
At times, there are situations where you need several asserts in the same test, even though you've scoped the test well and named it from an explicit requirement. This is when you're breaking the rule, and you know that you're breaking the rule.
Don't mock the Mockingbird
Let us take a look at the following code:
// don't
[<Test>]
let ``should first get customers from customer service and then store them to hard drive`` () =
// arrange
let customerService = MockRepository.GenerateMock<ICustomerService>()
let fileSystem = MockRepository.GenerateMock<IFileSystem>()
let cacheJob = CacheJob(customerService, fileSystem)
// setup mocks
customerService.Expect(fun service -> service.GetCustomers()).Return([(1, "Mikael Lundin")]) |> ignore
fileSystem.Expect(fun fs -> fs.AppendLineToFile("customer.txt", "1,Mikael Lundin")) |> ignore
// act
cacheJob.Execute() |> ignore
// assert
customerService.VerifyAllExpectations()
fileSystem.VerifyAllExpectations()
// do
// simplify the SUT or implement a vertical slice
What if we change the storage from filesystem to database? Should the test fail when implementation changes?
Mocks aren't evil, but they are very often the root source for brittle tests. Mocking in the sense of recording what will happen on the unit's dependencies and return fake results means that your test knows what dependencies the unit is having and in what sense it interacts with these. The test knows too much about the implementation of the system.
Fake dependencies are okay, in the sense that we can send in a fake object to the unit that we're testing, in order to fake a state on the program and this way test the expected results. Stubs are such that we fill a fake object with data and let the unit under test operate on this stub. The problem with mocks is that we put expectations on interactions between part of the system and this breaks when we're refactoring the code.
In the end, this is as bad as testing private functions, as they are both internal workings of the system's implementation.
The exception to this rule is when you want to test the interaction with an external system without making it an integration test. If you're implementing an Object Relational Mapper (ORM) and want to test what kind of SQL is generated for the database, a mock could be in place unless you can get that SQL in any other way.
Always refactor your SUT
In the testing pattern red, green, refactor the last part is often forgotten. Once the test goes green, it is easy for the developer to forget about the refactoring part, or just move on because of a pressing deadline.
The refactoring part is the most important part of testing as this is what makes our code high-quality. Some of our system code is very hard to test and will not be enabled for test automation until we have refactored it. It is therefore crucial to refactor after each test so that we don't build technical debt.
One of the greater points of unit testing is to enable refactoring. By having coverage over your features, you will ensure that nothing gets broken after the refactoring is complete. Refactoring is the art of changing the dinner table cloth with all the plates still standing on the table, and to enable that you need to have a test suite to keep you covered.
Writing tests is a way of designing your code, and this is why testing and refactoring go hand in hand. You should let your tests put requirements on the design of your code, and drive this design. It might be possible to retrofit tests on an existing design without changing anything, but then you're missing out on one of the true benefits of test-driven development.
Your test should never be more than 10 lines of code
Here is one controversial commandment that always takes my students aback when I'm teaching them test-driven development. In F#, this is not a hard requirement and in C# it only brings a healthy restriction on the length of a test.
Because the length of a test is directly proportional with how readable this test is. A longer test will be less readable, and the longer it is the harder it will be to maintain. Instead, I propose that we should have as short tests as possible. The optimal length of a test would be three lines of code. We have the triple A syntax pattern as follows:
· Arrange
· Act
· Assert
Sometimes, you need a few more lines of arrange, in order to set up the prerequisites for the test to run. Assert might need an extra line to help extract the result we got from running the test.
The act section of the test should always be only one line of code. This is important to keep the test cohesive. If the act only consists of one line of code, then we know that we're only testing one thing.
If the test as a whole is more than 10 lines of code, then our SUT is too complex and we should refactor it in order to bring complexity down. Long tests are a great test smell indicating that there is an underlying problem.
Not only leading to tests that are hard to maintain, but also having long tests will give the developer an urge to dry it up, by extracting method on the arrange part and having all the tests calling a setup method. This is a really bad idea as it breaks the test apart and makes it even harder to maintain. This refactoring of the test may, however, be closer at hand than doing a major refactoring of the SUT.
Always test in isolation
A good test is one that is completely confined within the function where it is defined. The state of the test suite and the state of the system are the same after the test has run, as it was before.
When tests are not isolated, you start to get maintenance and reliability problems. The most common refactoring that developers do on their test suite is to share setup between tests. It is common that tests share much of the setup code, but instead it should be considered a waste. Instead, we need to investigate how we can refactor the SUT in order to avoid excessive and repeatable tests setup. Once we've done this, we will end up with better tests and better-designed SUT.
Tests need not only to be isolated from each other by code, but also by state. One of the largest problems in test suites is that one test sets a particular state in the target system and this affects the results of subsequent tests. These bugs are hard to track down because the error is not in the test that is failing. Even worse, these bugs are seldom due to a fault in the actual system but only in the test suite, which makes every minute chasing these bugs a waste, unless you will end up reducing the need for state in the SUT.
It is quite common to set up a common state for several tests, and then tear it down when the test fixture is done. This is often done because the setup is a slow process that requires a lot of time. If it is done only once instead of 50 times, you can speed up the test suite substantially. The problem is that these tests will operate on the common state, and may inflict the appearances of a bug, where it was just due to tests changing a state that is not even possible in the system under testing.
In every situation, we should strive for isolating each test so that they don't touch upon other tests or their test runs. This is a Utopian idea and not always appropriate. We need to weigh our options to have a potentially brittle test suite that breaks when you remove a test, because the subsequent tests were depending on its state or having a test suite that takes hours to execute because each test needs to reset the whole application domain before performing one assertion.
We need to be pragmatic, and we need to know what is wrong and why we choose to do it anyway.
Controlling your dependencies
Control your dependencies before they take control of you. If you have a unit test where you need to stub out three dependencies, then you're spending more time dealing with the coupling of the SUT than actually testing that it's doing in the right thing. Dependencies are a huge problem in programming and something that needs to be dealt with very carefully.
When you're using a dependency injection framework, you're making life easy for yourself by letting the framework create your object and all its dependencies. This means that there is no problem for you to add more and more dependencies to the class under test, because there is no punishment for doing so. Not until you start writing tests.
When you're writing tests, you can tolerate one dependency of your unit, but not more than this. This is why your SUT needs to be abstracted so that one unit never has more than one dependency. This will make your code easier to read and follow. The downside is that the dependency graph will become deeper and it might get harder to understand the big picture of your system if you're not managing this dependency graph appropriately.
One way of dealing with the dependency graph is vertical slice testing where you exchange only the most outbound layers of your application with fake. The filesystem is a fake filesystem, the web service is a fake web service, and the database is a fake database. This way, there will be no need to deal with dependencies in your tests, as all functionalities will only be called in memory anyway.
Your test is a ninja
The reason that your test fails should be exact because it doesn't fulfill the promise of how the test is named. There should be no other reason for the test to fail, and it should be given no chance of failing because of anything else.
In many cases, we need to run a lot of code in order to get to the state that we want to test. This is a potential problem, because the code that we're passing through might fail and give us a failing test, but for a completely other reason than why the test was written. This is bad, because it makes the test suite hard to understand and hard to bug trace.
The only way to get around this problem is to limit the amount of code needed in order to run our tests, and that we can only do by refactoring the SUT. Bring in a state record that can be sent into the routine that we want to test and fake. This is one way to shorten the path to our unit. Another way is to reduce dependencies and reduce the size of the function. Often, we have to deal with this issue because the SUT has low cohesion, meaning that the function that we want to test provides more than one service.
A good rule when writing unit tests is to be in and out as quickly as possible, making a small imprint as possible. In order to enable this, we will often have to refactor the SUT.
The test is not complex
You should avoid complexity in your test at all costs. A few lines of straightforward workflow should be enough for arrange, act, and assert.
This means that we disallow the following in our tests:
· Conditional statements such as if, match, and function
· Looping constructs such as while, for
· Additional types, interfaces, discriminated unions, or classes apart from the SU.
· Threads
· Exception handling, except asserting for thrown exceptions
· Manipulation of lists, sequences, maps, or tuples
In short, your test should be as simple as the following steps:
1. Set up the prerequisites to run the test.
2. Run the test.
3. Assert the result.
If anything else is needed, you probably have a too complex system that needs refactoring in order to bring down the complexity of your tests. Complexity in your tests will hurt you in several different ways. The most common is that your test will fail even before touching the SUT because the initialization logic assumes conditions of the SUT that might change during refactoring.
Fixing a complex test takes time, because it takes time to understand the test. Most often, it is better to delete a failing complex test than trying to fix it. Even if its fixed, there is a high probability that it will fail soon again.
This leads us to the complexity of the test making it cost more than you actually gain in value. This means that we shouldn't write tests that are complex. Instead, we should stop in our tracks and ask ourselves how we can refactor the SUT in order to avoid the complexity.
Summary
In this chapter, we have been looking at the Ten Commandments for writing good test suites. Following these Ten Commandments is hard. It is a struggle to work with the limitations in order to produce even better test suites that provide more value than they cost. But if you follow these commandments, you will be falling into the pit of success, making it a struggle to fail.
This concludes the end of this book. We have been looking into how you can benefit from F# when writing high-quality code that is covered by tests. I have been showing how functional programming plays into the unit testing role and how we can benefit from the paradigm in order to make our test suites stronger and more robust.
I hope that you have enjoyed reading this book as much as I have enjoyed writing it, and I hope that you found it valuable in your path to great test suites and bug-free applications.