Pragmatic Unit Testing in Java 8 with JUnit (2015)

Part 2. Mastering Manic Mnemonics!

Chapter 6. What to Test: The Right-BICEP

It can be hard to look at a method or a class and anticipate all the bugs that might be lurking in there. With experience, you develop a feel for what’s likely to break and learn to concentrate on testing those areas first. Until then, uncovering possible failure modes can be frustrating. End users are quite adept at finding our bugs, but that’s embarrassing and damaging to our careers! What we need are guidelines to help us understand what’s important to test.

Your Right-BICEP provides you with the strength needed to ask the right questions about what to test:

Right	Are the results right?
B	Are all the boundary conditions correct?
I	Can you check inverse relationships?
C	Can you cross-check results using other means?
E	Can you force error conditions to happen?
P	Are performance characteristics within bounds?

[Right]-BICEP: Are the Results Right?

Your tests should first and foremost validate that the code produces expected results. The arithmetic-mean test in Chapter 1, Building Your First JUnit Test demonstrates that the ScoreCollection class produces the correct mean of 6 given the numbers 5 and 7. We show it again here.

iloveyouboss/13/test/iloveyouboss/ScoreCollectionTest.java
	@Test
	public void answersArithmeticMeanOfTwoNumbers() {
	ScoreCollection collection = new ScoreCollection();
	collection.add(() -> 5);
	collection.add(() -> 7);

	int actualResult = collection.arithmeticMean();

	assertThat(actualResult, equalTo(6));
	}

You might bolster such a test by adding more numbers to ScoreCollection or by trying larger numeric values. But such tests remain in the realm of happy-path tests—positive cases that reflect a portion of an end-user goal for the software (it could be a tiny portion!). If your code provides theright answer for these cases, the end user will be happy.

A happy-path test represents one answer to the important question:

If the code ran correctly, how would I know?

Put another way: if you don’t know how to write a test around the happy path for a small bit of code, you probably don’t fully understand what it is you’re trying to build—and you probably should hold off until you can come up with an answer to the question.

In fact, some unit testers explicitly ask themselves that question with every unit test they write. They don’t write the code until they’ve first written a test that demonstrates what answer the code should return for a given scenario. Read more about this more disciplined form of unit testing in the chapter on TDD (see Chapter 12, Test-Driven Development).

“Wait,” sez you, “Insisting that I know all the requirements might not be realistic. What if they’re vague or incomplete? Does that mean I can’t write code until all the requirements are firm?”

Nothing stops you from proceeding without answers to every last question. Use your best judgment to make a choice about how to code things, and later refine the code when answers do come. Most of the time, things change anyway: the customer has a change of mind, or someone learns something that demands a different answer.

The unit tests you write document your choices. When change comes, you at least know how the current code behaves.

Right-[B]ICEP: Boundary Conditions

An obvious happy path through the code might not hit any boundary conditions in the code—scenarios that involve the edges of the input domain. Many of the defects you’ll code in your career will involve these corner cases, so you’ll want to cover them with tests.

Boundary conditions you might want to think about include:

· Bogus or inconsistent input values, such as a filename of "!*W:X\&Gi/w$→>$g/h#WQ@.

· Badly formatted data, such as an email address missing a top-level domain (fred@foobar.).

· Computations that can result in numeric overflow.

· Empty or missing values, such as 0, 0.0, "", or null.

· Values far in excess of reasonable expectations, such as a person’s age of 150 years.

· Duplicates in lists that shouldn’t have duplicates, such as a roster of students in a classroom.

· Ordered lists that aren’t, and vice versa. Try handing a presorted list to a sort algorithm, for instance—or even a reverse-sorted list.

· Things that happen out of expected chronological order, such as an HTTP server that returns an OPTIONS response after a POST instead of before.

The ScoreCollection code from Chapter 1, Building Your First JUnit Test seems innocuous enough:

iloveyouboss/13/src/iloveyouboss/ScoreCollection.java
	package iloveyouboss;

	import java.util.*;

	public class ScoreCollection {
	private List<Scoreable> scores = new ArrayList<>();

	public void add(Scoreable scoreable) {
	scores.add(scoreable);
	}

	public int arithmeticMean() {
	int total = scores.stream().mapToInt(Scoreable::getScore).sum();
	return total / scores.size();
	}
	}

Let’s probe some boundary conditions. Maybe pass a null Scoreable instance:

iloveyouboss/14/test/iloveyouboss/ScoreCollectionTest.java
	@Test(expected=IllegalArgumentException.class)
	public void throwsExceptionWhenAddingNull() {
	collection.add(null);
	}

The code generates a NullPointerException in the arithmeticMean() method, a bit too late for our tests. We’d rather let the clients know as soon as they attempt to add an invalid value. A guard clause in add() clarifies the input range:

iloveyouboss/14/src/iloveyouboss/ScoreCollection.java
	public void add(Scoreable scoreable) {
*	if (scoreable == null) throw new IllegalArgumentException();
	scores.add(scoreable);
	}

It’s possible that no Scoreable instances exist in the ScoreCollection:

iloveyouboss/14/test/iloveyouboss/ScoreCollectionTest.java
	@Test
	public void answersZeroWhenNoElementsAdded() {
	assertThat(collection.arithmeticMean(), equalTo(0));
	}

The code generates a divide-by-zero ArithmeticException. A guard clause in add() answers the desired value of 0 when the collection is empty:

iloveyouboss/14/src/iloveyouboss/ScoreCollection.java
	public int arithmeticMean() {
*	if (scores.size() == 0) return 0;
*	// ...
	}

If we’re dealing with large integer inputs, the sum of the numbers could exceed Integer.MAX_VALUE. Perhaps we’d like to allow that:

iloveyouboss/14/test/iloveyouboss/ScoreCollectionTest.java
	@Test
	public void dealsWithIntegerOverflow() {
	collection.add(() -> Integer.MAX_VALUE);
	collection.add(() -> 1);

	assertThat(collection.arithmeticMean(), equalTo(1073741824));
	}

Here’s one possible solution:

iloveyouboss/14/src/iloveyouboss/ScoreCollection.java
	long total = scores.stream().mapToLong(Scoreable::getScore).sum();
	return (int)(total / scores.size());

The narrowing cast from long down to int gives us pause. Should we probe again with another unit test? No. The add() method constrains the input to int values, and because division by a count always returns a smaller number, it shouldn’t be possible to end up with a result larger than an int.

When you design a class, it’s entirely up to you whether or not things like potential integer overflow need be a concern in the code. If your class represents an external-facing API, and you can’t fully trust your clients, you want to guard against bad data.

However, if the clients are coded by members of your own team (who are also writing unit tests), then you might choose to eliminate the guard clauses and let your clients beware. This is a perfectly legitimate choice and can help minimize the clutter of redundant overchecking of arguments in your code.

If you remove guards, you could warn client programmers with code comments. Better, add a test that documents the limitations of the code:

iloveyouboss/15/test/iloveyouboss/ScoreCollectionTest.java
	@Test
	public void doesNotProperlyHandleIntegerOverflow() {
	collection.add(() -> Integer.MAX_VALUE);
	collection.add(() -> 1);

	assertTrue(collection.arithmeticMean() < 0);
	}

(You probably don’t want to allow unchecked overflow in most systems, however. Better to trap and throw an exception.)

Remembering Boundary Conditions with CORRECT

The CORRECT acronym gives you a way to remember potential boundary conditions. For each of these items, consider whether or not similar conditions can exist in the method that you want to test, and what might happen if these conditions are violated:

· Conformance—Does the value conform to an expected format?

· Ordering—Is the set of values ordered or unordered as appropriate?

· Range—Is the value within reasonable minimum and maximum values?

· Reference—Does the code reference anything external that isn’t under direct control of the code itself?

· Existence—Does the value exist (is it non-null, nonzero, present in a set, and so on)?

· Cardinality—Are there exactly enough values?

· Time (absolute and relative)—Is everything happening in order? At the right time? In time?

We’ll examine all of these boundary conditions in the next chapter.

Right-B[I]CEP: Checking Inverse Relationships

Sometimes you’ll be able to check behavior by applying its logical inverse. For mathematic computations, this is often the case: you can verify division with multiplication, addition with subtraction, and so on.

We decided to implement our own square-root function using Newton’s algorithm (a silly idea, given that Math.sqrt() is a trustworthy native implementation; apparently, we suffer from not-invented-here syndrome). We recall that if we derive the square root of a number and square that result (that is, multiply it by itself), we should get the same number we started with:

iloveyouboss/15/test/scratch/NewtonTest.java
	import org.junit.*;
	import static org.junit.Assert.*;
	import static org.hamcrest.number.IsCloseTo.*;
	import static java.lang.Math.abs;

	public class NewtonTest {
	static class Newton {
	private static final double TOLERANCE = 1E-16;

	public static double squareRoot(double n) {
	double approx = n;
	while (abs(approx - n / approx) > TOLERANCE * approx)
	approx = (n / approx + approx) / 2.0;
	return approx;
	}
	}

	@Test
	public void squareRoot() {
	double result = Newton.squareRoot(250.0);
	assertThat(result * result, closeTo(250.0, Newton.TOLERANCE));
	}

	}

In the test, we derive result by calling Newton.squareRoot() with the argument 250. Our assertion expects that result (whatever it is—we don’t have to know) multiplied by itself will be very close to the original value of 250.

Be careful! If both routines use common code, both the production code and the inverse behavior could share a common defect. Seek an independent means of verification. Using multiplication works as an inversion of the square-root logic. Another example: for code that inserts into a database, write a direct JDBC query in your test.

Another nonmathematical example: in the iloveyouboss application, the Profile class supports adding Answer objects. We want a flexible interface on Profile that supports finding Answers given a Predicate:

iloveyouboss/15/test/iloveyouboss/ProfileTest.java
	int[] ids(Collection<Answer> answers) {
	return answers.stream()
	.mapToInt(a -> a.getQuestion().getId()).toArray();
	}

	@Test
	public void findsAnswersBasedOnPredicate() {
	profile.add(new Answer(new BooleanQuestion(1, "1"), Bool.FALSE));
	profile.add(new Answer(new PercentileQuestion(2, "2", new String[]{}), 0));
	profile.add(new Answer(new PercentileQuestion(3, "3", new String[]{}), 0));

	List<Answer> answers =
	profile.find(a->a.getQuestion().getClass() == PercentileQuestion.class);

	assertThat(ids(answers), equalTo(new int[] { 2, 3 }));
	}

Here’s the relevant implementation in the Profile class:

iloveyouboss/15/src/iloveyouboss/Profile.java
	public class Profile {

	private Map<String,Answer> answers = new HashMap<>();
	// ...

	public void add(Answer answer) {
	answers.put(answer.getQuestionText(), answer);
	}
	// ...

	public List<Answer> find(Predicate<Answer> pred) {
	return answers.values().stream()
	.filter(pred)
	.collect(Collectors.toList());
	}
	}

A cross-check might involve finding the complement of the predicate—answers whose questions are not of type PercentileQuestion. The positive-case answers and the inverse answers should combine to represent all the answers:

iloveyouboss/15/test/iloveyouboss/ProfileTest.java
	List<Answer> answersComplement =
	profile.find(a->a.getQuestion().getClass() != PercentileQuestion.class);

	List<Answer> allAnswers = new ArrayList<Answer>();
	allAnswers.addAll(answersComplement);
	allAnswers.addAll(answers);

	assertThat(ids(allAnswers), equalTo(new int[] { 1, 2, 3 }));

Cross-checking is a way of ensuring that everything adds up and balances, much like the general ledger in a double-entry bookkeeping system.

Right-BI[C]EP: Cross-Checking Using Other Means

Any interesting problem has umpteen solutions. You choose a blue-ribbon winner, perhaps because it performs or smells better. That leaves the “loser” solutions available for cross-checking the production results. Maybe the runners-up are too slow or inflexible for production use, but they can help cross-check your winning choice, particularly if they’re trusted ‘n’ true.

We can use the “inferior” Java library implementation of square root to cross-check. (Apparently we suffer from bad egos.) We check whether or not our new superspiffy square-root logic produces the same results as Math.sqrt():

iloveyouboss/15/test/scratch/NewtonTest.java
	assertThat(Newton.squareRoot(1969.0),
	closeTo(Math.sqrt(1969.0), Newton.TOLERANCE));

Another example: suppose you’re developing a system for managing a lending library. The expectation for a library is that, at any given time, everything must balance. For each book, the number of copies checked out plus the number of copies on shelves (not checked out) must equal the total number of copies held in the collection. Each count is a separate piece of data, potentially stored in a separate location, but all together they still must agree and so can be used to cross-check one another.

Another way of looking at cross-checking is that you’re using different pieces of data from the class itself to make sure they all add up.

Right-BIC[E]P: Forcing Error Conditions

The existence of a happy path suggests that there must be an unhappy path. Errors happen, even when you think they can’t possibly. Disks fill up, network lines drop, email goes into a black hole, and programs crash. You want to test that your code handles all of these real-world problems in a graceful or reasonable manner. To do so, you need to write tests that force errors to occur.

That’s easy enough to do with invalid parameters and the like, but to simulate specific network errors-—without unplugging any cables—-takes some special techniques. We’ll discuss one way to do this in here.

First, however, think about what kinds of errors or other environmental constraints you might introduce to test your code. Here are a few scenarios to consider:

· Running out of memory

· Running out of disk space

· Issues with wall-clock time

· Network availability and errors

· System load

· Limited color palette

· Very high or very low video resolution

Good unit testing isn’t simply exhaustive coverage of the obvious logic paths through your code. It’s also an endeavor that requires you to pull a little creativity out of your rear pocket from time to time. Some of the ugliest defects are those least expected.

Right-BICE[P]: Performance Characteristics

Rob Pike of Google: “Bottlenecks occur in surprising places, so don’t try to second guess and put in a speed hack until you have proven that’s where the bottleneck is.” Indeed, many programmers speculate about where performance problems might lie and about what the best resolution might be. The only problem is that their speculations are often dead wrong.

Rather than guess and stab at performance concerns, you can design unit tests to help you know where true problems lie and whether or not your speculative changes make enough of a difference.

This test asserts that a bit of code runs within a certain amount of time:

iloveyouboss/15/test/iloveyouboss/ProfileTest.java
	@Test
	public void findAnswers() {
	int dataSize = 5000;
	for (int i = 0; i < dataSize; i++)
	profile.add(new Answer(
	new BooleanQuestion(i, String.valueOf(i)), Bool.FALSE));
	profile.add(
	new Answer(
	new PercentileQuestion(
	dataSize, String.valueOf(dataSize), new String[] {}), 0));

	int numberOfTimes = 1000;
	long elapsedMs = run(numberOfTimes,
	() -> profile.find(
	a -> a.getQuestion().getClass() == PercentileQuestion.class));

	assertTrue(elapsedMs < 1000);
	}

We wonder if that test is useful. Let’s talk about that in a moment.

Java 8 makes it easy to build a run() method:

iloveyouboss/15/test/iloveyouboss/ProfileTest.java
	private long run(int times, Runnable func) {
	long start = System.nanoTime();
	for (int i = 0; i < times; i++)
	func.run();
	long stop = System.nanoTime();
	return (stop - start) / 1000000;
	}

A few cautions are called for:

· You typically want to run the chunk of code a good number of times, to shake out any issues around timing and the clock cycle.

· You need to ensure somehow that Java is not optimizing out any parts of the code you’re iterating over.

· Such a test is very slow compared to the bulk of your tests, which take at most a few milliseconds each. Run performance tests separately from your fast unit tests. Running performance tests once a night is probably sufficient—you don’t want to find out too long after someone introduces crummy code that doesn’t perform acceptably.

· Even on the same machine, execution times can vary wildly depending on sundry factors such as load on the system.

More troublesome is the fact that too many things are arbitrary. The preceding example asserts that the find operation handles a thousand requests in less than a second. But that second is subjective. Running the test on a beefy server, sure, the code might be fast enough, but on a crummy desktop, maybe not. Dealing with a test that fails depending on the environment is never fun, and there’s no easy solution to ensure that it runs consistently from one environment to the next. About the only solution is to ensure that such tests run only on a machine comparable to the production environment.

Second, that criterion of 1,000 requests per second seems pulled out of thin air. Performance requirements are usually only relevant on an end-to-end functionality basis, yet the preceding test verifies unit-level code behavior. Unless the method you’re testing is the entry point to the end-user request, you’re comparing apples and oranges.

A better use of a unit-level performance measurement is to provide baseline information for purposes of making changes. Suppose you suspect that the Java 8 lambda-oriented solution for the find() method is suboptimal. You’d like to replace it with a more classic solution to see if the performance improves.

Before making optimizations, first write a performance “test” that simply captures the current elapsed time as a baseline. (Run it a few times and grab the average.) Change the code, run the performance test again, and compare results. You’re seeking relative improvement—the actual numbers themselves don’t matter.

images/aside-icons/tip.png

Base all performance-optimization attempts on real data, not speculation.

If performance is a key consideration, you likely will be concentrating on the problem at a higher level than unit testing, and you’ll likely want to use tools like JMeter.^[26] If you still have a significant interest in unit-level performance measurement, take a look at third-party tools like JUnitPerf.^[27]

After

In this chapter you learned about what sorts of tests you’ll want to write. Using the Right-BICEP mnemonic, you’ll remember to write tests that cover happy paths, boundary conditions, and error conditions. You’ll also remember to bolster the validity of your testing by cross-checking results and looking at inverse relationships. You also know when it might be useful to look at the performance of your code.

Next up, you’ll dig deeper into the CORRECT mnemonic that we touched on in this chapter. You’ll pick up a few additional ideas on how to cover the many boundary cases that crop up in the code you write.

Footnotes

[26]	http://jmeter.apache.org/
[27]	http://www.clarkware.com/software/JUnitPerf.html