Automating acceptance criteria for the UI layer - How do I build it? Coding the BDD way - BDD in Action: Behavior-Driven Development for the whole software lifecycle (2015)

BDD in Action: Behavior-Driven Development for the whole software lifecycle (2015)

Part 3. How do I build it? Coding the BDD way

Chapter 8. Automating acceptance criteria for the UI layer

This chapter covers

· Why and when you should write automated UI tests

· Using Selenium WebDriver for web tests

· Finding and interacting with page elements in your tests

· Using the Page Objects pattern to make your tests cleaner

· Libraries that extend Selenium WebDriver

In the previous chapter, you learned how using a layered approach to automated acceptance testing helps make your tests clearer, more robust, and more maintainable. We discussed the three broad layers used in well-designed automated acceptance tests: the Business Rules layer, the Business Flow layer, and the Technical layer. In the following few chapters, we’ll focus on approaches and tools that can be used to implement the Technical layer, starting with the user interface.

In this chapter we’ll discuss techniques to automate UI tests for web-based applications (see figure 8.1). Users interact with an application through its user interface, and in modern web applications, the UI implementation plays a major role in the overall user experience. The screenshots from automated web tests can be a valuable aid for testers, and they’re also a great way to provide illustrated documentation describing how the application behaves.

Figure 8.1. In this chapter you’ll learn how to automate acceptance criteria in order to exercise the UI of web-based applications.

We’ll look at automated web testing from several perspectives:

· Automated web tests are very effective at testing user interactions with the UI, and for illustrating end-to-end user interactions with the application. But if they’re badly designed or used to test things that would be better tested by non-UI tests, web tests can become a maintenance liability.

· Selenium WebDriver is a popular open source library that can be used to write automated web tests in a number of languages, such as Java, Groovy, Ruby, C#, and Python.

· Selenium WebDriver provides good support for Page Objects. The Page Objects design pattern can to help make automated web tests more readable and easier to maintain.

· Many open source libraries build upon and extend Selenium WebDriver to make writing automated web tests easier and more convenient.

Although we’ll focus on web testing in this chapter, many of the tools and approaches discussed here also apply to other types of user interfaces. For example, mobile apps can be tested effectively using the toolset we’ll discuss here by using Appium (, a WebDriver-based automation library for mobile apps, and the Page Objects pattern is applicable for any type of GUI.

To write effective automated web tests, you need to know not only how to automate web tests well, but also when you should and shouldn’t automate scenarios with web tests.

8.1. When and how should you test the UI?

Web tests have some significant advantages over other types of testing:

· The visual results, when reported well, are an effective way to describe how the user interacts with the application to achieve particular tasks.

· They reproduce end-user behavior much more closely than under-the-hood tests.

· They’re a great way to demo features to stakeholders.

· They can significantly reduce the need for manual UI testing, which represents a significant overhead for testers.

A web test, by definition, is designed to verify UI behavior. But web tests, as end-to-end tests, can also be an effective way to illustrate and check how all of the components in the system work together. Used as living documentation, a web test also often does a great job of documenting how a user will use the system to achieve a particular goal. Web tests can also help give business analysts, testers, and stakeholders more confidence in the automated acceptance tests.

8.1.1. The risks of too many web tests

But web tests aren’t without a degree of risk. Sometimes when teams start out with BDD, they try to automate all of their requirements exhaustively, almost exclusively using detailed, fine-grained, script-like automated web tests. This is a natural tendency for many testers who come from a background of automated functional testing using commercial tools such as HP’s QuickTest Professional. But down this path danger lies. It’s very easy to build up a large suite of brittle automated web tests that are costly to maintain and hard to keep up to date. Poorly designed script-like automated web tests often contain a large amount of duplicated code. When a web page is modified, each test that manipulates the page needs to be updated individually, which is time-consuming and error-prone.

Automated web tests generally run much more slowly than non-web tests, and they tend to be more fickle. Some tests may not behave in the same manner from one browser to another, and they may require browser-specific tweaks. Tests can also fail for reasons beyond the control of your code: an incorrect version of Firefox running on the build server, a page timeout because of network latency, and so on.

Record-Replay style scripting tools also present their own particular category of problems. These tools, which include products like Selenium IDE and QuickTest Professional, allow users to record test scripts through a visual tool and replay the tests afterwards. This approach sounds simple and intuitive, but it’s deeply flawed. One problem is that these scripts are extremely brittle, hard to read, and unclear about their intent, and are, as a result, virtually unmaintainable. The other problem is that these scripts can’t be written until the application development has been completed, which means that they tend to be written as test scripts to verify the implemented behavior, rather than as automated acceptance criteria that contribute to and help guide development efforts from the early stages of the project. For these reasons, it’s virtually impossible to do good BDD-style acceptance testing with Record-Replay tools.

Fortunately, web browser automation using open source tools like Selenium WebDriver has improved with age, and today it’s very possible to write reliable, robust, and maintainable automated web tests, particularly if you apply the principles of layers and reusable steps that we’ll discuss in this chapter. Isolating the code that interacts with the web page in a single class or method (using the Page Objects pattern, for example) goes a long way toward making these tests easier to maintain.

But no matter what tooling you use, the problem of speed still remains (see figure 8.2). An automated web test needs to open a browser and reproduce the actions of the user through the browser, which takes time. Each page load slows down the test. For modern web applications using AJAX-based libraries like AngularJS and Backbone.js, the page updates tend to be much faster, so speed is less of an issue. But, in general, automated web testing will always be significantly slower than tests that exercise the application code directly.

Figure 8.2. Automated web tests have a somewhat justified reputation of being slow (courtesy of

8.1.2. Web testing with headless browsers

One strategy for addressing the issue of slow web tests is to use a headless browser. Headless browser libraries such as HtmlUnit for Java (, Webrat for Ruby (, and Twill for Python ( send HTTP queries directly to the server, without having to start up an actual web browser like Firefox or Chrome. HtmlUnit, for example, works with WebDriver and a number of other Java-based web-testing libraries, providing APIs to analyze the structure of the HTML document. PhantomJS ( provides a more accurate browser simulation, because it renders the HTML like a real browser would, but does so internally. Headless tests often run more quickly than they would using a real browser, and they also make it easier to run a number of tests in parallel.

Some of these libraries, such as HtmlUnit, have limited support for AJAX and JavaScript, so your mileage with a modern JavaScript-based website may vary. This can sometimes make them less useful for more complex user interfaces, such as modern JavaScript one-page applications.

If your application does rely heavily on AJAX and JavaScript, then PhantomJS provides significantly more reliable browser emulation than HtmlUnit, including the features that you’d expect of a real browser, such as good support for dynamic asynchronous behavior and the ability to capture screenshots. PhantomJS is used by a number of web-testing libraries, including WebDriver. Although it’s not a great deal faster than a real browser for individual tests, it tends to be more tolerant of parallel testing.

It’s important to note that headless browsers don’t have exactly the same behavior, or render in exactly the same way, as real browsers. HtmlUnit uses the Rhino Java-Script implementation, which isn’t used by a real browser. PhantomJS uses WebKit, which may have different behavior than Firefox or Internet Explorer. And if your application contains important business logic in the client, you may want to verify that it also works correctly across different browsers, including a range of real ones.

8.1.3. How much web testing do you really need?

Web tests clearly have their uses. But you rarely need to test every aspect of a system using web tests, and doing so is generally not a good idea. In fact, in a typical BDD project, a significant proportion of automated acceptance tests will be implemented as non-web tests (see figure 8.3). These non-UI tests can take many forms, as you’ll see in chapter 9, including what would traditionally be classed as integration or unit tests. Many automated acceptance criteria, particularly those related to business rules or calculations, are more effectively done directly using the application code rather than via the user interface, as non-web tests can test specific business rules more quickly and more precisely than an end-to-end web test.

Figure 8.3. A typical BDD project will have many more non-UI automated acceptance tests than UI ones.

Of course, it can be tricky to know whether to implement an acceptance test as a web test or a non-web test. You only need a web test for two things:

· Illustrating the user’s journey through the system

· Illustrating how a business rule is represented in the user interface

Web tests do an excellent job of illustrating how a user interacts with the system via the user interface to achieve a particular business goal. But they don’t need to show every possible path through the system—just the more significant ones. More exhaustive testing can be left to faster-running unit tests.

Web tests can also illustrate how business rules are reflected in the user interface. For example, suppose that when they book a flight, Frequent Flyer members should be given the option to choose their seat, a privilege not offered to other customers. This would be a good candidate for an automated web test.

A good rule of thumb is to ask yourself whether you’re illustrating how the user interacts with the application or underlying business logic that’s independent of the user interface. For example, suppose you were testing a user authentication feature. The acceptance criteria might include the following:

· The user should receive feedback indicating the strength of the password entered.

· Only strong passwords should be accepted.

The first acceptance criterion relates to the user’s interaction with the web page, and would need to illustrate how this feedback is provided on the login page. This would be a good candidate for an automated web test.

The second criterion, on the other hand, is about determining what makes a strong password, and what passwords users should be allowed to enter. While this could be done through the user interface by repeatedly submitting different passwords, this would be wasteful. What you’re really checking here is the password-strength algorithm, so an application-code-level test would be more appropriate.

8.2. Automating web-based acceptance criteria using Selenium WebDriver

In this section, we’ll look at automating web tests using Selenium WebDriver. Selenium WebDriver is a popular open source web browser automation library that can be used to write effective, automated web tests. It also forms the basis for many higher-level web-testing tools. The examples will focus on working with WebDriver in Java, but the principles and techniques we’ll discuss will be generally applicable to any WebDriver-based testing.

WebDriver is a browser-automation tool. It lets you write tests that launch and interact with a real browser. This interaction can include simple clicks on buttons or links, or more sophisticated mouse operations such as hovering or dragging and dropping.

WebDriver lets you check the test outcomes by inspecting the state of the page in the browser. WebDriver also gives you the ability to take screenshots along the way—screenshots that can be used later as part of the test reports or living documentation.

Figure 8.4 shows a high-level view of WebDriver. WebDriver supports a large number of web browsers, including Firefox, Chrome, and Internet Explorer. This allows you to test your application in different environments and with different browsers. You can also run web tests in headlessmode, using a special JavaScript-based browser called PhantomJS.

Figure 8.4. Overview of the WebDriver architecture

WebDriver tests can be written in a number of languages, including Java, Groovy, Ruby, C#, and Python. The WebDriver API varies little from one language to another, so you can generally use the language you’re most comfortable with, or the one that provides the most value for you in terms of ease of writing, maintainability, and living documentation.

The WebDriver API is powerful and flexible, but there are several open source libraries for different platforms that can help you build on WebDriver to write web tests more efficiently and more expressively, including Thucydides, Watir, WatiN, and Geb. We’ll look at some of these libraries in action in section 8.3.2. For most of this chapter, you’ll use the WebDriver API with Java.

8.2.1. Getting started with WebDriver in Java

Let’s start with a very simple example of web browser automation with WebDriver. You’ll illustrate WebDriver’s features using a simple version of the Flying High Frequent Flyer website that we’ve discussed in previous chapters.

If you want to follow along, you can download both the website and the sample code from either the GitHub repository ( or from the Manning website. The sample code repository contains two directories:

· The flying-high directory contains the sample website (see sidebar).

· The flying-high-tests directory contains the sample WebDriver code we’ll discuss.

Running the sample website

The Frequent Flyer website you’ll use is a simple standalone website that will run on any web server. It’s a simple, single-page, JavaScript-based web application. You can either deploy it to your own web server or run it as a standalone website. One way to do this is to use Node.js, which is a lightweight JavaScript platform used to build and run JavaScript-based server-side applications. You don’t need to know anything about Node.js to run the sample site; just follow the instructions.

First, you’ll need to install Node.js, which you can download from the Node.js website ( Once this is done, install the Node.js http-server tool from the command line as follows:

npm install –g http-server

Now go into the flying-high directory of the chapter 8 sample code, and start up the website with the following command:

http-server app

To view the running application, open up a web browser and go to http://localhost:8080 (unless you already have an application running on port 8080, in which case it will run on port 8081). You should see a page similar to the one in figure 8.5.

Figure 8.5. Flying High Frequent Flyer members identify themselves using their email address and a password.

Suppose you’re testing the sign-in feature of your Frequent Flyer website. Registered members need to enter their email address and password to access their account details. The login screen looks something like the one in figure 8.5.

Once a member has entered a matching email and password, they’ll be welcomed to the member’s area with a friendly message (see figure 8.6).

Figure 8.6. Authenticated members are welcomed with a friendly message.

A scenario describing this requirement might look like this:

Using the WebDriver Java API, your test might contain the following code:

The first thing you do is create a new WebDriver driver instance . This driver instance, which implements the WebDriver interface, is the starting point for all of your interactions with the application.

The WebDriver interface has a number of implementations, one for each supported browser, and you need to specify which implementation you want to use. Here, you’re creating a new instance of the FirefoxDriver class, which you’ll use to run the tests in a Firefox browser.[1] You can run the tests in a different browser by using a different implementation (see table 8.1).

1 Of course, you’ll need to have Firefox installed for this to work.

Table 8.1. The main WebDriver implementations


WebDriver implementation




Normally works out of the box if Firefox is installed.



Download the chromedriver executable from and put it on the system path (see

Internet Explorer


Download the standalone server from and put it on the system path (see



A headless JavaScript browser that should work out of the box.



Another headless JavaScript browser, faster than the others but less reliable with modern AJAX-based applications.



A vendor-supported driver for the Opera browser written in Java. You just need to add a dependency on the operadriver library in your project.[2]



Implemented as a Safari browser extension.

2 Refer to for details about what versions of Opera are supported.

Creating the FirefoxDriver instance will open a new Firefox window. You can open a specific page by using the get() method, as shown here:


This will open the page shown in figure 8.5. Once the page is open, you can start to see how to identify and manipulate elements on the page.

8.2.2. Identifying web elements

The next step in the test involves entering the user’s email address and password into the appropriate fields. In WebDriver, any object you’d like to inspect or manipulate in some way is represented by the WebElement class.

You can find a web element on a page using the findElement() method. This method uses a fluent API to identify objects in a very readable manner. For example, in the following code, you find the email field by looking for an HTML element with the name attribute set to email:

Once you have the element, you can query or manipulate it as required. In the preceding code line, you use the sendKeys() method to simulate a user typing something into the field. Later in the test, you click on the login button (identified by its id attribute) using the click() method:


Finally, at the end of the test, you check the text contents of the welcome message, conveniently identified by an id attribute:

WebElement welcomeMsg = driver.findElement("welcome-message"));

assertThat(welcomeMsg.getText()).isEqualTo("Welcome Jane);

One of the nice things about this API is that it’s not only very readable, but it’s very easy to use. In modern IDEs, the auto-complete feature can be used to list the available methods for the various objects and classes used in the WebDriver API (see figure 8.7). This makes the API both easy for new developers to learn and very productive for more experienced developers.

Figure 8.7. Modern IDE features, such as auto-completion, make the WebDriver API easy to work with.

Identifying fields or objects with a given name or id attribute is the easiest and most robust way to obtain a web element; these attributes are less likely to change when the structure or style of the page changes. In fact, it’s a good idea to make sure that all semantically significant elements in a page have a unique ID or name.

This example is relatively straightforward, with the fields and buttons being easy to find. In real-world applications, this isn’t always the case, and there are some situations where other strategies are more convenient. Fortunately, WebDriver provides a number of other ways to identify web elements.

Identifying elements by link text

When you write automated web tests, you often need to click on links, either to navigate to another page or to trigger some action. For example, on the Frequent Flyer site, a user can click on the Book link at the top of the page at any time to go to the booking page (see figure 8.8).

Figure 8.8. Identifying a hyperlink by its text

Links like this rarely have a name or id attribute that you can use to identify them. But you can use the next best thing: the text of the link itself. To click on the Book link, you could write the following:


You can also search the link texts for a partial match. To click on the Flying High Airlines link in the top-left corner of the page, the following call would work:

driver.findElement(By.linkText("Flying High")).click();

Identifying links by their text content is simple, intuitive, and relatively robust, though the test will obviously break if the displayed text is modified.

Identifying elements using CSS

A more flexible way of identifying elements is to use CSS selectors. CSS selectors are patterns designed to identify different parts of a web page for formatting and styling, but they’re also a great general-purpose way to identify elements on the page.

Let’s see how CSS selectors can be used for automated web testing. Suppose marketing has asked you to display a list of featured destinations on the home page, as shown in figure 8.9.

Figure 8.9. CSS selectors can come in handy when you’re working with lists like this.

You can find web elements with CSS selectors by using the By.cssSelector() method. In CSS, the hash (#) symbol is used to find an object by its ID. To find the welcome message, you could do something like this:


Of course, this could be done more simply using But CSS selectors become more valuable when you need to find web elements without clean id or name attributes. Some of the more useful CSS selectors are listed in table 8.2.

Table 8.2. Useful CSS selectors






Matches all elements with the class navbar



Matches the element with an id of welcome-message



Matches all the <img> elements

element element

.navbar a

Matches all the <a> elements inside an element with the class navbar

element > element

.navbar-header > a

Matches <a> elements directly under an element with the class navbar-header



Matches <a> elements with an href value of #/book



Matches <a> elements with an href value that starts with #



Matches <a> elements with an href value that ends in book



Matches <a> elements with an href value that contains book


.navbar li:nth-child(3)

Matches the third <li> inside an element of class navbar

Let’s look at a more practical example. One of the requirements you’ve defined with the marketing folk goes along the following lines:

Scenario: Displaying featured destinations

Given Jane has logged on

When Jane views the home page

Then she should see 3 featured destinations

And the featured destinations should include Singapore

On the Frequent Flyer home page, each featured destination appears inside a <div> element with the featured class. The destination title is nested inside a <span> element with the featured-destination class. The rendered HTML code looks something like this:

In CSS, you can match elements with a given class by using the period (.) prefix. Using a CSS selector, you could find all of the <div> elements that represent the featured destinations like this:

Note that you’re using the findElements() method rather than the findElement() method you saw previously. As the name suggests, the findElements() method returns a list of matching web elements, rather than just a single one. You then check the size of the returned list, using the FEST-Assert library ( to make the test more readable .

This would be enough if you just wanted to count the number of featured destinations, but if you need to check the destination titles, you’ll need to drill further. Fortunately, CSS selectors are flexible. You could retrieve the titles directly by finding all the web elements with thedestination-title class:


This would work, but it may not be robust. If destination titles were used elsewhere on the page, you’d retrieve too many titles. A safer approach would be to limit your search to the elements nested within the <div> that contains all of the featured destinations:

driver.findElements(By.cssSelector("#featured .destination-title"));

Once you have a list of matching web elements, you need to convert it to a list of strings that you can verify. You could write something like this:

First, retrieve the list of matching web elements . To get the text content of a WebElement, you use the getText() method, so loop through the web elements and extract the text contents of each one . Finally, check that the destination titles do indeed contain “Singapore.”

We’ve just scratched the surface of CSS selectors here, but they’re very useful for working with modern jQuery-based UI frameworks. You can find more details on the W3 web site ( Most modern browsers have excellent native support for CSS selectors, which means that tests using CSS selectors will generally be very fast.

Identifying elements using XPath

CSS selectors are flexible and elegant, but they do run into limits from time to time. A more powerful alternative is to use XPath. XPath is a query language designed to select elements in an XML document.

XPath expressions are path-like structures that describe elements within a page based on their relative position, attribute values, and content. In the context of WebDriver, you can use XPath expressions to select arbitrary elements within the HTML page structure. A list of useful XPath expressions can be found in table 8.3.

Table 8.3. Useful XPath expressions

XPath expression





Matches all of the <a> elements



Matches all of the <button> elements somewhere under the document root



Matches <span> elements that are situated directly under a <button> element



Matches <a> elements whose class attribute is exactly equal to navbar-brand


//div[contains(@class, 'navbar-header')]

Matches <div> elements whose class attribute contains the expression navbar-header



Matches the third <li> inside the <div> with an id of main-navbar


//h2[.='Flying High Frequent Flyers']

Matches the <h2> element with text contents equal to Flying High Frequent Flyers

You can find an element via an XPath expression by using the By.xpath() method. You could find the welcome message heading by using the following expression:


XPath requires more knowledge of the document structure than CSS, and it doesn’t benefit from the intimate understanding of HTML that’s built into CSS selectors. This makes simple selectors more verbose than their equivalents in CSS. For example, in the previous section you saw how you could find the list of featured destination titles using the following CSS selector:


The equivalent in XPath might be something like this:


This will find all of the <span> elements anywhere on the web page that have an attribute named class that’s equal to destination-title.

You could make this more generic by using a wildcard (*) instead of span:


This would find elements whose class was exactly equal to destination-title. Unfortunately, modern web applications will sometimes add extra classes to the class attribute, so you can’t rely on an exact match. A more reliable solution would be to use the XPath contains()function, matching elements that have a class attribute with a value that contains destination-title:


The full power of XPath becomes more apparent when you need to find elements based on their content—something that’s not currently supported in CSS. For example, the featured destinations you saw earlier are rendered in HTML like this:

<div class="featured-destination"...>

<img src="img/singapore.png"></img>

<span class="destination-title">Singapore</span>

<span class="destination-price">$900</span>


<div class="featured-destination">...</div>

You could find the <span> element containing the text Singapore using the following XPath expression:


You could take this further. Suppose you need to find the price displayed for the Singapore featured destination. XPath supports relative paths using the “..” notation, so you could find the neighboring <span> notation with a class of destination-price like this:


XPath isn’t without its disadvantages. XPath expressions are generally more verbose and less readable than CSS selectors. XPath expressions can also be fragile if they aren’t well-crafted. XPath has no native support in Internet Explorer, so tests that use XPath on Internet Explorer may run very slowly. But XPath is more powerful than CSS, and there are cases where XPath will be the only way to reliably identify the elements you’re looking for.

Using nested lookups

When you write automated tests with WebDriver, it’s important to keep expressions as simple and readable as possible. Simpler expressions tend to be easier to understand and to maintain, and in many cases they’re more reliable. One useful strategy when it comes to writing simpler WebDriver code is to use nested lookups.

So far, you’ve found web elements using the WebDriver instance. But you can also call the findElement() and findElements() methods directly on WebElement instances. For example, suppose there are several Book links at different places on the page, and you need to click on the Book link in the main menu. You could do this by first finding the main menu using its ID, and then finding the menu entry within the main menu’s web element using the linkText selector:

This approach is clear and intuitive and tends to be less error-prone than using complex XPath expressions or CSS selectors.

8.2.3. Interacting with web elements

Interacting with web elements in WebDriver is usually fairly intuitive, and it involves a relatively small number of methods. You can use the click() method on any web element (not just buttons and links) to simulate a mouse click. The sendKeys() method can be used to simulate user input. And the getAttributeValue() and getText() methods let you retrieve attribute values and the text contents of a web element.

The Frequent Flyer booking page illustrated in figure 8.10 has many fields that can illustrate these ideas.

Figure 8.10. The Frequent Flyer booking page

Text input fields

To enter a value into a text field, you can use the sendKeys() method, as shown here:


The sendKeys() method doesn’t set the value of the field; rather, it simulates the user typing the text into the field. If the field contains an existing value, you’ll need to use the clear() method before entering the new value.

The current value of a text field is stored in the value attribute. To retrieve an attribute value, you can use the getAttribute() method, as shown here:

String fromValue = driver.findElement("from")).getAttribute("value");

This approach will also work for any other form field that uses the value attribute, such as check boxes or the newer HTML5 input field types like email and date. The exception is <textarea> fields, which don’t have a value attribute. You can retrieve the contents of a <textarea>field by using the getText() method.

Radio buttons and check boxes

The simplest way to select a radio button value is to find the radio button you want and to click on it. This can be a little tricky because the name attribute isn’t unique and the id attribute isn’t always defined or directly related to the value. You can do this by using a CSS selector that combines the name and value you want, like this:



This approach will also work for check boxes, which behave in exactly the same way.

Drop-down lists

WebDriver provides a convenient helper class for dealing with drop-down lists. The Select class is used to wrap a web element representing a drop-down list to add drop-down-specific methods, such as selectByVisibleText(), selectByValue(), and selectByIndex(). On the booking page, you could set the travel class to “Business” using the following code:

WebElement travelClassElt = getDriver().findElement("travel-class"));

new Select(travelClassElt).selectByVisibleText("Business");

The Select class also provides a number of methods that you can use to learn about the current state of the drop-down list, including getFirstSelectedOption() and getAllSelectedOptions().

8.2.4. Working with asynchronous pages and testing AJAX applications

Most modern web applications use AJAX in one way or another. AJAX-based JavaScript libraries allow developers to write applications with vastly improved usability and user experience. But the asynchronous nature of AJAX can present challenges when it comes to automated web testing.

In a conventional web application, when you click on a link or submit a form, an HTTP request is sent to the server and a new page is returned. In these cases, WebDriver will automatically wait for the new page to load before proceeding. But with an AJAX application, the web page will send queries to a server and update the page directly, without reloading. When this happens, WebDriver will not know if or when it needs to wait for updates, which may cause the test to fail unexpectedly.

Fortunately, it’s relatively easy to tell WebDriver when it needs to wait and what it should wait for. Your main tools for achieving this are the Wait interface and its implementations.

Figure 8.11 illustrates a common case where waiting may be useful. On the Frequent Flyer booking page, the From and To fields are configured with a type-ahead capability that displays matching cities in a drop-down list as the user types.

Figure 8.11. Type-ahead fields are a common example of asynchronous lookups.

The time it takes to display the drop-down list can be unpredictable, as it may depend on the speed of the network and the target server. Depending on the nature of the web page, there are several ways you can handle this sort of delay.

Using the implicit wait

By default, if WebDriver doesn’t find a web element, it will fail immediately. But this behavior is configurable. For example, if you wanted WebDriver to wait for five seconds before declaring forfeit, you could set the implicit wait time to five seconds, as shown here:

driver.manage().timeouts().implicitlyWait(5, TimeUnit.SECONDS);

This is a blunt tool, as it will apply every time WebDriver looks up an element, which can slow down the tests in other places. A more refined approach is to use explicit waits.

Using explicit waits

WebDriver also lets you wait for specific events. The easiest way to do this is to use the ExpectedConditions class. This class provides a large number of useful predefined conditions that make the API more convenient to use. These include waiting for elements to be present (or not present), visible (or invisible), clickable, and so forth. For example, the following code waits until the type-ahead list (identified here by its class name) is present on the screen:

These predefined conditions cover many common situations. But occasionally you’ll need to do something more specific. Again, WebDriver offers several options. The FluentWait class allows you to create arbitrary wait parameters on the fly using a readable fluent API:

You can use this wait object with one of the predefined conditions from the Expected-Conditions class, or you can write your own condition. A condition takes the form of a Function object (from the Google Guava library,, and it typically returns either a WebElement (if you’re waiting for a web element to become available) or a Boolean (if you’re waiting for some more general condition). In the following example, you wait until the type-ahead list is present on the page and contains entries:

In this code sample, you create your own implementation of the Function interface to check the presence and size of the type-ahead list. The Function interface here represents a function that takes a WebDriver instance and returns a Boolean value . When this Boolean value istrue, the test will be able to continue . The Function interface defines the apply() function , which you need to implement. In this case, the condition is relatively simple: the implementation retrieves and checks the size of the type-ahead list . This implementation could be more complicated if you need to check more involved conditions.

8.2.5. Writing test-friendly web applications

The style and quality of your web application code has a significant influence on how easy or hard it will be to test. Applications with clean HTML code, identifiers, names, and CSS classes for all the significant elements on a page make testing easier and more reliable. When applications have messy or inconsistent HTML code, the elements can be hard to identify, which results in more complicated and more brittle selector logic.

The technology stack you choose can have a major effect on testability. Frameworks that limit the control you have over the rendered HTML are a major source of difficulty. In Java web application development, for example, some frameworks automatically generate element identifiers for their own use, making it difficult to use the simplest and fastest of the WebDriver selectors.[3] Applications that use plugin technologies such as Flash and Silverlight, which are opaque to testing tools like WebDriver, also make testing very difficult.

3 Many JSF-based frameworks fall into this category.

8.3. Using page objects to make your tests cleaner

Up until now we’ve explored the WebDriver API using simple code samples. These examples work well to illustrate the various WebDriver methods, but you wouldn’t want to write code like this for real-world automated tests. For example, to log on to the Frequent Flyer website, you used the following code:





This code wouldn’t scale well. You’d need to duplicate the same or similar lines for every scenario involving a user logging on, and any change to this logic would need to be updated at every place that it’s used.

Another problem is that you’re mixing selector logic ( and so on) with test data ( and so forth), which prevents you from reusing the locators in other tests.

A better approach would be to refactor the selector logic into one place so that it can be reused across multiple tests. You could write a class along the following lines to do this:

This class wraps the lines of code you saw earlier into a single method called signin-WithCredentials() . It also provides an open() method to get to the right page. The code needs a WebDriver instance , which is provided in the constructor .

Now your test code can focus on the test data and the intent of the actions, rather than on how the individual web elements are found or manipulated:

8.3.1. Introducing the Page Objects pattern

The class we just looked at follows the Page Objects pattern. A page object is a class that models a web page, or part of a web page, and that presents a set of business-focused methods for tests to use, sparing them from the implementation details of the actual HTML page. A page object has two main roles:

· It isolates the technical implementation of the page from the tests, making the test code simpler and easier to maintain.

· It centralizes the code that interacts with a page (or a page component), so that when the web page is modified, the test code only needs to be updated in one place.

In section 7.3, we discussed the typical layers that make up a well-structured automated acceptance test suite:

· The Business Rules layer describes the expected business outcomes.

· The Business Flow layer relates the user’s journey through the application.

· The Technical layer interacts directly with the system.

Page objects belong in the Technical layer (see figure 8.12). They provide business-friendly services to the Business Flow layer and implement the interactions with the web pages using the WebDriver API.

Figure 8.12. Page objects are an important part of the Technical layer.

Page objects don’t have to represent an actual page. In some cases, such as the login page discussed earlier, it makes sense to have a page object dedicated to a page. In other cases, such as a modern JavaScript single-page application, a single HTML page might be represented by page objects for each state or view of the application.

It also makes sense to use page objects to represent important parts of a screen, particularly if those parts are reused from screen to screen. For example, you might use a page object to represent the main menu bar that appears on every screen, or for the list of featured destinations if this appears in several places. These component objects can be nested within other page objects or used independently.

Page objects in WebDriver

Although you can write your own page objects from the ground up, it’s nice to have some tooling support. The WebDriver API provides excellent built-in support for page objects. In particular, it provides the @FindBy annotation to simplify web element lookup. Using this annotation, you could rewrite the LoginPage page object like this:[4]

4 The open() method has been excluded from the listing for simplicity.

The @FindBy annotation tells WebDriver how to look up a WebElement field , . When you mark fields this way, you can use the PageFactory.initElements() method in the constructor to instantiate these fields for you. Each time you use these fields , WebDriver performs the equivalent of a driver.findElement(By...) call to bind them to the corresponding element on the web page. The @FindBy annotation supports all of the different selector methods available when you use driver .findElement(By...) (see table 8.4).

Table 8.4. Different ways to use the @FindBy annotation

@FindBy expression



Find by ID


Find by name


Find by CSS class name

@FindBy(css=".typeahead li")

Find by CSS selector


Find by link text


Find by partial link text


Find by HTML tag


Find by XPath expression

If the name of the WebElement fields in your page object matches either the name or ID of the corresponding HTML element, you can skip the @FindBy annotation entirely:

In this case, WebDriver automatically instantiates the email and password fields . For the email field, for example, this is the equivalent of first trying @FindBy(id="email"), and if that fails, @FindBy(name="email").

The @FindBy annotation isn’t limited to individual fields; you can also use this notation to retrieve collections of web elements. All you need to do is define your field as a list of WebElements instead of a simple WebElement, as shown here:

WebDriver also provides the @FindBys annotation, which can be used to define nested @FindBy annotations.

@FindBys({@FindBy(id="main-navbar"), @FindBy(linkText = "Book")})

WebElement bookMenu;

This is the equivalent of the nested findElement() methods we saw in section 8.2.2:

WebElement bookMenu = driver.findElement("main-navbar"))


By default, if WebDriver can’t find an element on the page to match a WebElement field in your page object, it will fail immediately. This is often a good thing, because if WebDriver can’t find an element it was expecting to find, then either your test or the application is likely to be broken, and you should be notified about this as quickly as possible. But if you’re working with dynamic pages using AJAX-based web elements, you might need to give the application some time for a dynamic field to appear before failing the test.

The type-ahead feature we looked at earlier is a good example of this. You could use an @FindBy annotation like the following to retrieve these values. The corresponding page object might look something like this:

The first time you use this variable in your page object , WebDriver tries to find the matching HTML elements on the page. But due to the asynchronous nature of this feature, the list may not be populated yet, and your test will fail.

To get around this problem, you can configure WebDriver to not fail immediately when it can’t find an element, but rather to poll the page repeatedly for a predefined period. You do this by using the AjaxElementLocatorFactory class when you initialize the web elements, as illustrated here:

Now, if the type-ahead elements aren’t ready straightaway, WebDriver will keep trying for up to five seconds before failing.

In this particular application, this approach will work just fine, but it’s not foolproof. For example, suppose you have a checkout page with a total field that’s present on the screen but initially contains no data. When the user selects an option (such as opting for travel insurance), an AJAX call will update this field. The web element corresponding to the total field will always be present on the page, but you need to wait for it to be populated with the correct values. In this case, the polling approach won’t work, and using the Wait interface discussed in section 8.2.4 would be a better strategy.

8.3.2. Writing well-designed page objects

Page objects are highly reusable components, and if they’re designed well, they can make your automated tests significantly easier to understand and maintain. A few simple rules can help you go a long way in designing better page objects.

Page objects should only expose simple types and domain objects

The most important design rule for page objects is that a page object should never expose implementation details about the page or component it’s encapsulating. Page object methods should accept and return simple types such as strings, dates, Booleans, domain objects, or other page objects. They should never expose WebDriver classes.

For example, the Frequent Flyer flight booking page (see figure 8.13) has many fields that could be encapsulated behind page object calls.

Figure 8.13. The flight booking page has a number of fields that could be read and written to via a page object class.

For example, the from and to fields could be exposed as simple String values:

In a similar manner, the depart and return fields might be presented as dates:

Lists of values should be returned as lists of primitive types (strings, dates, numbers, and so on) or as lists of domain objects. You saw an example of this with the type-ahead values:

public List<String> getTypeaheadEntries() {

List<String> entries = new ArrayList<String>();

for(WebElement typeaheadElement : typeaheadEntries) {



return entries;


Well-designed page objects also accept and return data in the form of domain objects when it makes sense to do so. In many cases, domain objects allow tests to be more expressive and readable than primitive types. For example, suppose you need to verify that the featured destination list contains a deal for Singapore costing $900 (see figure 8.14). You represent destination deals using a simple Java class called Destination-Deal, with a destination and a price.

Figure 8.14. The featured destinations in this screen could be represented as a list of domain objects.

If you design your page object to return a list of DestinationDeals, you could write test code like this:

The page object needs a getFeaturedDestinations() method that returns a list of DestinationDeals. Internally, this method needs to convert the data in the featured destinations list into a list of DestinationDeals.

Let’s walk through how you might implement this method. To start, you could use the @FindBy annotation to retrieve the list of featured destination <div> blocks:

@FindBy(css = ".featured .featured-destination")

private List<WebElement> featuredDestinations;

Next, you’d write the getFeaturedDestinations() method itself:

The code here loops through the web elements representing the featured destination <div> blocks and converts each to a DestinatonDeal object. The details of the conversion process are left to the destinationDealFrom() method, shown here:

This method reads the title and price from the web element and uses them to build a new DestinationDeal .

Page objects should report on page state

Page objects are responsible for reporting information about the contents or state of the page (or page component) to the test. The test can then use this information to perform any required checks and assertions; page objects shouldn’t contain assertion logic.

For example, on the flight booking page in figure 8.13, there’s a Search button. Suppose that this button is disabled until all of the required fields are filled. To test this, you could add a method with an embedded assertion to the corresponding page object, as shown here:


The preceding snippet uses Hamcrest, the other major Java fluent assertion library.

This isn’t a very clean solution. The page object isn’t only reporting the state of the page; it’s also making assertions about it, which should be the responsibility of the test logic. A better solution would be to have the page object return the state of the Search button, and let the test do the asserting:

Now the test code will be able to decide what assertion is the most appropriate:

assertThat(bookingPage.searchButtonIsEnabled(), is(true));

This approach provides a better separation of concerns and avoids the risk of having bloated page object classes with assertion methods that are only used by a single test.

There’s still one area where you could improve this example. Any test that simply asserts that something is true (or false) runs the risk of providing poor diagnostic information. For example, both of the assertions in the preceding examples will report the following error message if they fail:

Expected: is <true>

but: was <false>

When a test fails, it’s important for diagnostic messages to be informative, and most assertion libraries let you add customized messages. For example, you could rewrite the earlier test to provide a more informative message, as shown here:

assertThat("Search button should be enabled",


There’s still a little duplication in this code. Some WebDriver-based libraries also provide extensions that can produce more useful assert messages with even less effort. For example, in Thucydides, you can use the WebElementState class to return the state of a web element to the test. You could use a Thucydides page object for this page like this:

The WebElementState class provides a large number of methods that tests can use to query and make assertions about the state of a web element. For example, using this class, you could rewrite the test logic like this:


If this should fail, the error message will be relatively descriptive:

Field '<button id='search'>' should be enabled

This way, you can have the best of both worlds: flexible, reusable assertions about the state of your pages that you can place in the test logic, and useful error messages when a test fails.

Navigating with page objects

If an action on a page object causes the application to switch to another page, it’s sometimes useful to have the page object return a page object representing the new page. For example, you might write a page object to represent the main menu bar that appears at the top of the page:

public class MainMenu {


public PlanningPage selectPlanFlight() {...}

public BookingPage selectBookFlight() {...}

public FlyPage selectFly() {...}


You could now write fluent and readable test code like this:


This approach is often a matter of style. It can make the tests quicker and easier to write, but it may mask the navigation logic through the application and make maintenance harder if the navigation logic changes. It’s also debatable whether it’s the responsibility of the page object to understand the application’s navigation logic, or whether it should be left to the test logic to declare what page it expects to see at any point in the test. Many practitioners prefer to leave the test logic to implicitly describe the expected navigation by using the page object that corresponds to the screen they expect to be on.

If there’s a possibility that the action won’t always go to another page, or that it may go to multiple pages depending on application-logic considerations, then it’s unwise to code this navigation logic into your page objects. For example, a login page will go to a welcome page if the login succeeds or stay on the login page if an authentication error occurs. In this case, the page object will have to decide which page it should navigate to, which is effectively embedding business logic into the page object, and if this business logic changes, your page object will break.

WebDriver is an excellent foundation for automated web tests, and many other open source libraries have been built around WebDriver for more advanced or specific usages. In the next section, we’ll look at a few of these.

8.3.3. Using libraries that extend WebDriver

As we mentioned in section 8.2, there are many libraries for different platforms that build on and extend WebDriver, including these:

· Thucydides (—Provides powerful WebDriver support for Java and Groovy tests

· Watir (—A powerful Ruby DSL for WebDriver testing

· WatiN (—Provides support for automated web testing in C#

· Geb (—A Groovy DSL for WebDriver

Rather than trying to cover each library in detail, I’ll highlight two of the more compelling features from some of these libraries that make them so popular among test-automation practitioners.

Page objects

All of these libraries provide support for page objects in various forms, making page objects easier to define and use.

Thucydides takes care of instantiating page objects and managing the WebDriver instance, and it provides a number of useful base methods. A simple Thucydides page object looks similar to the standard WebDriver code we’ve discussed so far:

Thucydides will also instantiate any page object instances in the test code, which simplifies the test code further:

WatiN also provides support for page objects in C#. A page object written using WatiN looks like this:

The corresponding test code might look something like this:


5 For conciseness, we haven’t included the implementation of the HomePage page object here.

As in the Thucydides code from the previous example, the WatiN page objects neatly hide the web page implementation details from the test logic, making the test code cleaner, more concise, and more readable.

Fluent selectors

Dynamic languages like Ruby and Groovy make it easy to write fluent and readable APIs, and these can be used to write concise, readable expressions for identifying web elements on a page. Geb provides a powerful expression language based on a jQuery-like notation that allows you to write expressions using CSS selectors. The following code shows a few examples of this sort of expression:

In , you use a simple CSS selector to find and click on the search button. In the second example , you check whether this search button is disabled. The next example will find any <a> elements nested inside a featured destination block. The fourth example uses a more complex CSS expression to identify and click on the link in the second featured destination block. The final example fetches all of the <li> elements in the main menu and converts them into a list of Strings.

In Java, Thucydides provides its own support for fluent selectors. Though not as extensive as the Geb API, Thucydides still offers quite a few to choose from. The Thucydides equivalent of the Geb expressions shown above would look like this:

Thucydides also provides a jQuery-like $() method, so many of the expressions look similar to the Geb equivalents. Java doesn’t have closures, so in the last line you can use the LambdaJ library to do something similar.

When used well, fluent APIs like these are a boon to readability and expressiveness.

8.4. Summary

In this chapter you learned to automate UI tests for web-based applications:

· Automated web tests are a powerful testing tool, but one that should be used sparingly in order to avoid slowing down your test suite too much.

· The WebDriver API provides a rich and robust foundation for automated web testing.

· The Page Objects pattern can make your web tests cleaner and easier to maintain.

· Libraries, such as Thucydides, Watir, WatiN, and Geb, build on and extend WebDriver with additional features.

In the next chapter we’ll focus on techniques for testing the non-UI layers of the application.