Testing Microinteractions - Microinteractions (2014)

Microinteractions (2014)

Appendix A. Testing Microinteractions

There are many who would advise you not to bother testing microinteractions, saying they are the equivalent of asking “What color should the bike shed be?” That is: unimportant.[44] Let’s assume if you’ve made it this far into the book, you feel microinteractions have value and can be improved by being validated, tested, and refined via user input.

Microinteractions can benefit from using a Lean UX–style methodology of Build > Measure > Learn: build the microinteraction to test it; measure the design with a variety of quantitative and qualitative methods; learn from an analysis of those findings. Then iterate.[45]

Unlike a true Lean UX process, where you’re testing a “Minimum Viable Product” to see if the concepts (“hypotheses”) are valuable, with microinteractions we can mostly assume the overall concept is valuable—or at least necessary to the proper functioning of the app or device. You are more testing the flow and structure than testing the concept. Also dissimilar to Lean UX is the fidelity of the prototype. Rather than prototyping the least you can test (often a paper prototype), with microinteractions, because the structure of microinteractions is important, you need as high a fidelity prototype as you can develop in order to test them effectively. The links between trigger to rules to feedback to loop are tight and not easily separated.

Most microinteractions probably aren’t tested alone for desktop software. The effort and expense of setting up and running a testing session (not to mention the effort of building a prototype for testing) are typically too great to test a microinteraction alone for desktop, so they are often lumped together with other items to test. This is not necessarily true for web applications, where prototyping is faster, A/B testing easier to try, and analytics more readily available. Mobile applications, too, are getting easier to prototype. If the microinteraction is the whole mobile app, testing is essential; the same is true with devices, although the prototyping for them can be more time-consuming as well.

If statistical relevance is your thing, the bad news is that because microinteractions are small (and thus most changes to them are likewise small), they require more test participants to be relevant. This can mean hundreds (if not thousands) of participants, and it definitely means more than the usual 5–8 participants that many testing sessions have. At the barest minimum, you’ll need to aim for at least 20 participants for slightly better data. For the best quantitative data, you need hundreds, thousands, even tens of thousands of users, as is typical for testing on many online sites. If 5% of users open a drop-down, but only 4.75% successfully make a selection, that’s very difficult to detect even with thousands of users—and yet it can make a huge difference in sales and adoption.

Unless a microinteraction is terrible or wonderful, determining the statistical effectiveness of its nuances is nigh impossible through qualitative testing. Quantitative is the only real option. For example, adding Google Analytics “Events” to a web microinteraction can give a designer insights into the precise weak points of the microinteraction in a way that could only be done qualitatively by tracking many users over many weeks. That being said, if statistical relevance isn’t important to you, even testing with few participants can be illuminating—as always.

As with all product testing, you want to watch out for so-called “scenario errors” that are caused by the test itself. Since testing is an artificial, constructed situation, the setup and guided path the tester takes the user down can cause users to make errors or reveal problems that normal use would not. As just one example, pausing to ask or answer a question can cause crucial feedback to be missed.

What to Look for During Testing

The four most important things to validate with testing are these:

§ That you truly understand the goal of the microinteraction, and not just a step in the process. The point of setting a status message isn’t to type, it’s to communicate. Knowing this allows you to fix any emphasis problems, either in the microinteraction itself or in the overall product—how important is this microinteraction to the overall user experience?

§ That you understand what data is important. This lets you know what data to bring forward and what behavioral-contextual information is valuable to the microinteraction and could be used over time.

§ That any microcopy is necessary, and if so, that it’s clear and understood. This means both instructional copy and, especially, labels.

§ Timing and flow. Does the microinteraction take too long to perform? Are the loops too long? Too short? Note that long loops that happen over extended periods of time are difficult to test, unless you are doing a longitudinal study, which most developers do not.

The first two are often gleaned from conversation and interviews, the third by observation. But there are many more things to be learned by observation as well, such as:

§ Are there too many clicks/taps/control presses? In other words, is what the user’s trying to do requiring too much effort? This is not necessarily saying count clicks, although that is one measure of effort.

§ Any confusion as to why. If a user ever says (aloud or via frowning/puzzled looks) “Why am I doing this?” then something is wrong. Usually a label is misnamed, or instructional copy is missing or too vague.

§ What just happened? This is an indicator of unclear feedback, possibly paired with an unclear label.

§ Did anything just happen? There is either missing feedback or else the feedback is too subtle.

§ I can’t find what I’m looking for. There is a gap between what the user expects to find and what is there. This is probably a labeling problem, but it could also be that a crucial piece of the microinteraction is missing.

§ I don’t know where I am. This can be a problem with transitions or modes.

§ You just did what to my data/content/input? This is another case where expectations didn’t match the outcome. Either adjust the label or copy, or else this is a deeper, overall problem with the microinteraction in that it might not match what users are trying to accomplish, or else users are uncomfortable with what it does accomplish.

§ If I click/push/tap this, what happens? This is a case of an unclear label or poor instructional copy.

§ I didn’t see that button. This is a problem with visual hierarchy. The path through the microinteraction isn’t visually clear.

§ I didn’t know I could do that. An action is too hidden. This often happens with any multitouch gestures or an invisible trigger such as a key command.

§ What do I do now? This is the same problem as above: the path isn’t clear, especially the next step.

§ What am I seeing there? This is the result of unclear feedback, usually on a process. Add or clarify with a label, perhaps on a tooltip. This could also mean the data you’re showing isn’t important.

These are all examples of qualitative data, but quantitative can be useful as well.

Using Quantitative Data

There is an adage (coined by Lord Kelvin) that what can’t be measured, can’t be improved, and there is some truth to it. Having a baseline—a starting point—and/or something to compare changes to is immeasurably helpful. These are some data points you can test:

Completion rate

What percent of users were able to complete the microinteraction?

Overall duration of the microinteraction

How long did it take to complete the microinteraction? (It’s often the case that the slowest users can take five to ten times longer to complete tasks than the fastest, so use a geometric mean instead of the median to lessen the effect of this type of extreme value.[46]

Duration of specific steps

Number of steps

Number of clicks/taps/selects

This is not always instructive but can let you know if something is inefficient.

Number of system errors

Are there places where the microinteraction fails through no fault of the user? (These are often found when testing on live microinteractions with actual data/connectivity.)

Number of human errors

These fall into two categories: slips and mistakes. Slips are when the user understands the goal of the action but does something improperly, such as making a typo when entering an email address. A mistake is when a user does not understand the rules and tries something the rules won’t allow, such as clicking a header that isn’t interactive.[47]

You can also attempt to quantify qualitative data such as by having users rate characteristics like:

§ Satisfaction

§ Difficulty

§ Confidence

§ Usefulness

on a rated scale (e.g., 1–7, 1 being low, 7 high). However, especially with a small sample size, this can be far from definitive.

This assumes, however, that you will be revising the microinteraction and testing it again to see if there have been improvements, or that you have an alternate version of the same microinteraction to compare with (A/B testing). Again: beware of sample size. A small number of users could make something like an error or a preference seem more (or less) significant than it is.

And even if there is statistical significance, it doesn’t mean there is practical significance. The most important lesson about using data to help design is this: it can’t design for you. Data requires a human being to interpret it, and then place it into context. Data will seldom tell you whysomething is happening.

The data needs to be made meaningful, which sometimes means ignoring it. Why would you ever ignore data? Here’s the simplest example: most online advertising isn’t clicked. If you get a 0.5% clickthrough rate, you’re often doing very well.[48] So should we remove all online ads, since they are so seldom used? 99.9% of users think so (the other 0.1% of people work for advertising agencies). But getting rid of advertising would essentially mean getting rid of the site itself, as there would be no money to operate it. Would you like Google to go away? You can’t listen to the data entirely because the data doesn’t understand the overall context: the business and organizational environment and the user base that are more than just numbers on a spreadsheet. Data should be an input to your decision making, not the decider alone.

A Process for Testing Microinteractions

The following is one possible process for testing microinteractions that could be followed. It is certainly not the only process, but it could be a starting point:

1. Before showing participants any prototypes, ask them how they expect the microinteraction to work. Ask if they’ve ever used anything similar in the past. Ask what the one thing is that they want to accomplish by using this microinteraction. Check if there is anything they would want to know before using the microinteraction—if there is one piece of information that would make using the microinteraction unnecessary.

2. Have them use the microinteraction unaided. Any quantitative data should be collected at this point, and/or immediately after.

3. Go through the microinteraction with the user step by step, having the participant talk out loud about any impressions and decisions. See if participants can explain how the microinteraction works (the rules). Note any discrepancies.

4. Ask if they came back tomorrow, what would they want the microinteraction to remember about them.

5. End by asking what one thing should be fixed.

With this process, you should be able to uncover and diagnose any issues with the microinteraction, as well as validate any of the overall goals and needs. I recommend doing this process at least twice, with two sets of participants, revising the microinteraction based on user feedback and findings analysis between sets.

[44] “What color should the bike shed be?” is from developer lore. See this link for the whole story.

[45] See the book Lean UX by Jeff Gothelf (O’Reilly).

[46] See “8 Core Concepts for Quantifying The User Experience,” by Jeff Sauro, Measuring Usability, December 11, 2012.

[47] For more on slips and mistakes, see Norman, Donald, “Design Rules Based on Analyses of Human Error,” Communications of the ACM, 26, 1983, and Human Error by James Reason, 1990.

[48] See, for example, “So Many Ads, So Few Clicks,” BusinessWeek, November 11, 2007.