Selenium Design Patterns and Best Practices (2014)
Chapter 5. Stabilizing the Tests
"And the rain descended, and the floods came, and the winds blew, and beat upon that house; and it fell not: for it was founded upon a rock."
--Matthew 7:25, King James Version
When the test suite becomes large enough, our job becomes less about the fixing every flaky test. In fact, it centers on engineering a solution that will prevent all similar flaky behavior from happening.
In this chapter, we will give our tests a good solid foundation that will prevent a lot of instability in the long run. We waited until this chapter to start fixing the behavior that drives anyone who writes web tests insane, because we had to first build up a foundation of good data management and coding skills. These skills are crucial for long-term use and without them, all of the fixes of instability discussed in this chapter would be useless. Now we're ready to talk about the following topics:
· Culture of stability
· Waiting for AJAX requests to finish
· Waiting for jQuery animations to finish
· The Action Wrapper pattern
· The Black Hole Proxy pattern
· Screenshot on failure practice
Engineering the culture of stability
I'd like to start the current chapter with a personal tale of a past experience. The majority of projects that I worked on had similar situation to what you are probably used to. Typically, the Selenium build is treated as a second-class citizen, not having a single passing build for days or weeks at the time. Eventually, the tests become so embarrassingly riddled with failures and instabilities that any further development is stopped, and the Selenium build is completely ignored.
On my last project I inherited 300 Selenium tests, which were red 90 percent of the time. So, I started to fix them but that was not enough; no sooner that I would fix a broken test, somebody would make a commit that broke another test somewhere else. I did not have a technical problem, I had a cultural problem; nobody but me seemed to care about Selenium tests.
The team that I was a part of was given the task of maintaining builds; with a lot of trial and error, we came up with several key goals that would lead all of our builds to be passing 99 percent of the time (less actual failures due to bad code). Here are the key goals, as I see them, for any CI system:
· Running fast and failing fast
· Running as often as possible
· Keeping a clean and consistent environment
· Discarding bad code changes
· Maintaining a stable test suite
Running fast and failing fast
A developer's time is very expensive. We cannot afford to let them sit around for 40 minutes to see whether all of the tests are passing after every minor code change. The goal is to run the whole test suite under 10 minutes, or the developer will not have an incentive to run the tests at all. Doing the simple math of the man hours spent by each developer on daily basis waiting for the build, compared to doing actual work, we had a very convincing argument to purchase a lot more test nodes for CI. With these new computers, we were able to run the test suite in parallel across multiple computers, bringing the whole build down to 12 minutes. Furthermore, we added some code to send an e-mail to the developer as soon as a test failed. This allows the developers to start fixing a broken test even before the build is complete.
Running as often as possible
Creating a cascading build chain, starting with unit tests and finishing with Selenium, is a common practice. However, this practice turned out to be an anti-pattern, a term discussed in Chapter 2, The Spaghetti Pattern. A typical Selenium build is the slowest in the series; thus, it occupies the last place where everyone can easily ignore it. Often, a failure early in the chain will prevent the Selenium build from ever being executed. By the time the long forgotten Selenium build is finally executed, a dozen code commits have occurred. Making sure that the Selenium build is triggered on every single commit seems excessive, but the whole idea of CI is to catch a test failure as soon as it occurs, not 20 changes down the road. Taking this idea to its logical conclusion, a code change should always be considered bad if even a single test fails.
Having the whole code base being deployed and tested with every code change also has an advantage of testing the deploy scripts continuously.
Keeping a clean and consistent environment
Unlike instability caused by test implementation, instability caused by inconsistent testing nodes can be more frustrating and harder to track down. Having different versions of Firefox or Internet Explorer on every test node might not seem like a big deal, but when a test fails because of such minor differences and the failure cannot be easily replicated, a lot of frustration will be experienced.
We discussed test fixtures in Chapter 4, Data-driven Testing; reloading the test database for every build is a great way to keep a clean and consistent test environment. Also, using a configuration management tool to keep all of the dependencies, such as Java versions, consistent on all of the test nodes will save you a lot of headaches. Finally, make sure that the test environment that serves your website is as close of a physical clone of production as you can make it. All of your tests can be completely invalid if your production uses Linux servers to host the website, but your test environment is hosted on a Windows computer.
There are several open source, free tools for the configuration management of computers. Two of the more popular ones are Chef (http://www.getchef.com/) and Puppet (http://puppetlabs.com/).
Discarding bad code changes
We set up a simple system that prevented anybody from committing changes to the master/trunk unless all of the tests, including Selenium, were passing. Needless to say, this was not a popular approach because tests from unrelated parts of the application were sometimes preventing new features from going into Version Control System (VCS). However, as the test suite stabilized, this became a great way to prevent unintended defects from going into production, and making sure that the whole test suite, including Selenium, was always passing!
There are multiple ways to implement this, since most VCS systems allow users to define precommit or postcommit hooks. The other approach is to prevent direct commits to the trunk/master branches, instead deferring to a build that automatically merges the changes after all tests pass. The latter approach works best in GIT and Mercurial VCS tools.
Maintaining a stable test suite
Cultural changes will never last if your tests will fail at random due to technical problems such as not dealing with AJAX properly or not accounting for external influences that will make the test environment run slow. In this chapter, we will concentrate on some of the most common technical solutions that make tests unstable. Let's get going!
Waiting for AJAX
Test automation was simpler in the good old days, before asynchronous page loading became mainstream. Previously, the test would click on a button causing the whole page to reload; after the new page loaded, we could check whether any errors were displayed. The act of waiting for the page to load guaranteed that all of the items on the page are already there, and our test could fail with confidence if the expected element was missing. Now, an element might be missing for several seconds, and magically show up after an unspecified delay. The only thing for a test to do is become smarter!
Filling out credit card information is a common test for any online store. Similarly, we set up a simple credit card purchase form that looks like this:
The preceding steps were very simple and straightforward; anyone who has made an online purchase has seen some variation of this form. Writing a quick test to fill out the form and making sure the purchase is complete should be a breeze!
Testing without AJAX delays
Let's get started then. We have to add two new methods to the TestData class. We need one method to generate realistic credit card numbers and another method that generates expiration dates. These two new methods will look like this in the test_data.rb file:
The Luhn test algorithm is a simple checksum formula created by Hans Peter Luhn. It is used by majority of credit card companies when generating an account number. Here are examples of a Luhn valid test credit cards for VISA: 4444 3333 2222 1111 and 4111 1111 1111 1111. Similarly, test numbers for MasterCard are 5555555555554444 and 5454545454545454.
Now let's create a new test file called purchase_form_test.rb. Let's take a look at our very simple PurchaseFormTests class; we will start with the same boilerplate code that we have seen many times in previous chapters:
Looking at the actual test, we should see a lot of similarities to the code we wrote in Chapter 4, Data-driven Testing. Let's take a quick look:
We close the test file with the helper images in the private section:
If we compare the code from this test with product_review_test.rb from the previous chapter, we will notice that the helper methods are pretty much identical. This is typically a good sign that a code refactors in order. However, before we can start refactoring, we should first concentrate on making the tests work.
Remember, premature optimization is the root of all evil in software programming.
So, without any further delays, let's run our tests. Our output should look like this:
We have a passing test for the purchase form; in a perfect world, our work would be complete. In the next section, let's take a look at a scenario that is a little more realistic.
Using explicit delays to test AJAX forms
We now have a test that will work perfectly well when testing the website against a fast test environment like localhost. These environments tend to stub the purchase form responses to create an environment that is easily testable in CI. However, our staging and production environments communicate with a third-party service to validate the credit card information.
For more information about stubbing third-party services, visit Chapter 4, Data-driven Testing.
Let's see how well our tests do in such an environment. In the previous chapter, we implemented a concept of environment in the TestData class. It's time to put it to use by pointing our tests toward the staging environment with the help of command line variables. On a Windows-based computer, type the following command in the terminal:
If you are using a Linux-based computer, including OS X, we will use the export command:
Now let's run our test the same way we just did. The terminal should now display this:
What went wrong? If we were watching the test run on the monitor, we would notice that the Purchase complete! message did not appear instantly. Instead, we saw an AJAX request indicator, colloquially known as spinner, as shown in the following screenshot:
Since the success DIV only shows the Purchase complete! text after the asynchronous request is completed, our test only saw an empty string; thus it failed. The most obvious and fastest way to fix our test is to add a sleep command to allow the AJAX request to complete. The code will look like this:
Just like every other anti-pattern, this quick fix makes our tests pass right away with some long term unintended consequences. In this particular case, the purchase form's AJAX request will take up to 30 seconds to complete. Telling the test to pause for 25 seconds raises these issues:
· Wasted time: The majority of the requests made by the purchase form will finish in less than 15 seconds. This means that our tests will be doing nothing even though the page is in ready state.
Avoiding unnecessary delays becomes very important as the test suite grows. Remember, we want the whole test suite to finish in 10 minutes or less.
· Environment unaware: Only the staging environment has such a delay with the AJAX request, the CI environment gives an instant response. As mentioned in previous point, this is wasted time.
· Wait can be too short: Once in a while the staging environment or the third-party service can be under heavy load and the request might take longer than 30 seconds. The hardcoded sleep value is not adequate enough to deal with real-world scenarios.
What we need is to make our tests smart enough to know when the AJAX request to complete.
Implementing intelligent delays
If your current project does not use jQuery to make AJAX requests, check the documentation of your framework for something analogous to jQuery.active. If all else fails, you can take Dave Haeffner's approach of injecting jQuery into a web page that does not have it included. You can find his blog post at http://elementalselenium.com/tips/53-growl.
Let's take a look at the wait_for_ajax method implementation:
There is a lot going on here; so let's break things down a little starting on line 49. We create a new instance of the Wait class provided by Selenium WebDriver. When creating this new class, we explicitly set the timeout to be 60 seconds; when the timeout is reached, the test will get back the control and move to the next step. The Wait class has an until method that accepts a block of code, line 50 and 51.
Now, we just add the wait_for_ajax invocation anywhere we need our tests to wait. We will be replacing the hardcoded sleep method from earlier, as shown here:
As a good habit, after we refactor any code, we run our tests to make sure everything is passing. Let's take a look at the test results with the wait_for_ajax method included. In the following screenshot, we can see that the total execution time of the test went up to accommodate the background AJAX request:
When websites started to use AJAX, the developers and designers faced a new challenge. Previously, any major interaction with a website, such as clicking the purchase button, gave a user clear indication that something is changing after each action. With asynchronous requests, parts of the web page can change and user would not notice that something important has happened. So, the designers came up with ways to draw user's attention towards the section of the page that has changed. It started with fading in the changing content in a yellow box, slowly incorporated a spiny wheel, and now we have whole page swipes and many other animations to accomplish this.
Animation is an act of changing the web page; it ranges from adding or subtracting images to removing everything on the page and starting over.
There are several situations in which a Selenium test will fail with ElementNotVisibleError even though the element we are looking for is technically on the page. If our test is attempting to click on a button, the following conditions will prevent the click:
· Not currently visible: Some websites place the button somewhere on the page, but make it invisible until it is ready to be clicked. Often, they will use an animated transition effect to slowly fade in the button to make the experience feel pleasant. Attempting to click on the element, which is still transparent, will not be successful.
· Under other elements: Let's say a defect is introduced in the page layout where some element such as a text input is out of place and ends up covering up the button we wish to click on. The button is present on the screen and technically functional. However, since the human user is not able to point the mouse at it and click it, WebDriver will not allow the test to click on it either.
This page contains a purchase form similar to the ones we have been dealing with, with one minor difference. The Purchase button is invisible until enough text fields are filled out; after a threshold for completeness is reached, the Purchase button slowly fades in. The following screenshot shows the purchase form before the animation complete and after the animation is complete:
If we run our test without any modifications, we will get the following test failure in our output:
Before we start to refactor all of the code duplication into the Action Wrapper pattern, let's make sure our test is now passing. The test output should look like this:
The Action Wrapper pattern
The idea behind the Action Wrapper pattern is to collect all of the most common pain points, such as AJAX, and automatically implement them every time that action is performed. It helps to future proof the tests by automatically accounting for things that commonly go wrong and destabilizing the tests.
Advantages of the Action Wrapper pattern
The Action Wrapper pattern has a lot more advantages than disadvantages; let's take a look at them:
· Single location for actions: All of the actions such as clicking, typing, and dealing with AJAX requests and animations are in a single class. This makes them easy to find and modify and very DRY.
The DRY principle and the DRY pattern are discussed in Chapter 3, Refactoring Tests.
· Increased overall build stability: Overall, the test suite becomes a lot more stable since forgetting to add a wait no longer breaks random tests at random times.
· Capture and append exceptions: If an action (such as clicking on a button) cannot be performed, we can capture the stack trace and add more information for better debugging.
· Helps to implement screenshot pattern: This pattern makes it easier to add functionality that will capture screenshots of the whole web page on test failures.
Disadvantages of the Action Wrapper pattern
The biggest disadvantage of the Action Wrapper pattern is increased time. We are trading fast build time for a more stable build, which is typically a good trade.
The build time increase is not that dramatic. If intelligent delays are implemented properly, we will be adding 10 percent to 20 percent time increase, while reducing test flakiness by up to 80 percent.
Implementing the Action Wrapper pattern
By using the Wrapper pattern on the Selenium class, we are able to add some additional functionality to our test actions. A click on the Purchase button does not have to be just a click; it can become so much more. Wrapping an action gives us the ability to ask the AJAX and animations to finish after we click on any button automatically. Furthermore, we are able to catch any exception in our test and take a screenshot of the whole page at that moment in order to help us debug the failure!
The Wrapper pattern, also called Decorator pattern or Adapter pattern, is a design pattern used to encapsulate certain objects to give them more functionality than initially designed. For example, in Selenium, the click method and the save_screenshot methods are separate entities. By wrapping the click method, we are able to attempt a click and take an instant screenshot of the webpage if the click fails for any reason whatsoever.
To save some time, I did some refactoring for us, so please download the new project from here http://awful-valentine.com/code/chapter-5. To make the project files more manageable, I created several new folders and grouped files inside. Let's look at the new places for everything, starting with all the files that deal with test data. They now live in the fixtures directory as shown here:
All of the tests we have written so far now live in the tests directory, as shown here:
The images directory is where we will store screenshots of the web page on test failures, but right now it is empty. Finally, the helpers directory, shown in the following screenshot, is where we will store the selenium_wrapper.rb. We will implement the Action Wrapper pattern in this file:
The SeleniumWrapper class will become a single point of contact between the tests and the web page being tested. Let's take a look at this class in detail; we will start with methods responsible for the creation and destruction of browser sessions:
The initializer method creates a new instance of WebDriver with a chosen browser that defaults to Firefox. It stores this new session in the @selenium instance variable for future uses, such as when the quit command is invoked.
Since waiting for AJAX and animations to complete is a common task in every test, we moved those methods into SeleniumWrapper class, as shown here:
Since will be using these methods a lot, let's make a small method called wait_for_ajax_and_animation that calls both AJAX and animation wait, as shown here:
Next, we moved all of the little helper methods such as type_text or click into the SeleniumWrapper class. This allows us to have these methods implemented only once and shared by all of the tests. However, we have modified these methods to become a lot more powerful. Let's take a look at the type_text method that is shown here:
This may seem confusing at first, but the send_keys method we used so many times before is still present on line 35. Let's discuss the new code that surrounds the send_keys method.
Let's look at the next piece of code, line 34. Before we start typing any text into a text field, we use the clear method to delete any text that might have been in the text box. By explicitly clearing the text boxes, we avoid situation where the new input is appended to existing text in the filed. This is especially useful on text fields that have default values in the text field that need to be overwritten.
After the test finished typing text into the text field, we call the wait_for_ajax_and_animation method, as shown on line 36. This is to allow any animation or AJAX requests to finish. This is extremely useful when testing input fields that use AJAX to auto complete text as the user types it.
The most important part of this action wrapper is the exception handling built in around each action. Typically, if the WebDriver click or send_keys encounter any difficulty, such as an element not being visible, an exception would be raised and the test exists. By wrapping these methods in begin/rescue statements, as shown in the following screenshot, we are able to print out more information about the failure and take a screenshot of the web page:
The test will still fail when it encounters a problem but will print out information about what it was trying to do. Furthermore, a screenshot is incredibly helpful when debugging a test in CI. We will not go into the details of every method implemented in the SeleniumWrapper class since all of the code in that class should be familiar. Let's take a look at the refactored purchase form test we have been working on this chapter. As you can see in the following screenshot, the overall size of the test file has shrunk as a lot of boilerplate and duplicate code has been moved out to a central location:
The final change has been an addition of the runt_tests.rb file. We moved all of the boilerplate require statements that used to be in every test here. We no longer need to run each test file individually; instead, we can run the full test suite by simply running this command:
The result of this command should be all of the tests executing with the help of the SeleniumWrapper class. The result of running all of the tests should look like this:
The Black Hole Proxy pattern
The Black Hole Proxy pattern tries to reduce test instability by getting rid of as many third-party uncertainties as possible. Modern websites have a lot of third-party content loaded on every page. There are social networking buttons, images coming from CDNs, tracking pixels, and much more. All of these items can destabilize our tests at any point. Black Hole Proxy takes all HTTP requests going to third-party websites and blocks them, as if the request was sucked into a black hole.
Advantages of the Black Hole Proxy pattern
Black Hole Proxy brings many advantages to our tests:
· Improved speed: Since the web applications we test tend to be on the local network, the web page loads are much faster if there is no wait for third-party content to load.
· Improved stability: Modern web applications have a lot of third-party dependencies that are not critical to core functionality of the application. These include tracking pixels and social media buttons, such as Facebook or Twitter. Sometimes, these third-party dependencies will make our tests fail because they are taking a longer than usual amount of time to load. Blocking these noncritical third-party dependencies allow our Selenium tests to verify the functionality of our application without breaking due to unpredictable dependencies.
· Hermetically sealed tests: The test has higher control over the environment. By blocking third-party content, we reduce external dependencies that cause test failures.
Disadvantages of the Black Hole Proxy pattern
There are two major disadvantages to the Black Hole Proxy pattern:
· Broken layout: If a lot of third-party content is removed from the page, the page will still function, but the locations of buttons and images might shift to fill out the newly created gaps on the page.
· Third-party content tests are broken: Any test that tries to check the third-party integration, such as logging in with social network credentials, will not work. We have to implement a way to give the tests control over the Black Hole Proxy pattern.
Implementing the Black Hole Proxy pattern
Our website integrates with a couple of third-party social networks. The A book is a social network for people whose name start with the letter A. The Walker network is for sending 142 character status updates to your walking buddies. Both of the networks are integrated at random spots of our application. Furthermore, our website has two banners on every page. Overall, our purchase form page looks something like this:
Our social network partners are having a slow network connection. To simulate that, let's modify the PurchaseFormTests test once more. We change the first line of the test to navigate to a new page that has a lot of slow loading third-party dependencies, as shown in the preceding image. Let's modify our test's target URL like this:
This new URL takes us to a page that is designed to simulate extremely slow loading third-party assets such as social network sites and tracking pixels. If we run our test suite now, we will get Timeout::Error, as shown in the following screenshot, because the tests timed out while waiting for the page to finish loading. An uncontrollable delay is caused by third-party dependencies:
We will be taking advantage of the HTTP proxy settings that all browsers use. Our tests will send all of the HTTP traffic, without our testing environment, to a fake proxy that will swallow up all of the requests. Let's add a couple of lines to the class initializer:
The preceding code performs the following actions:
· It creates an instance of the Firefox::Profile class
· It configures the HTTP proxy to point to a non existing proxy on 127.0.0.1 with port of 9999
You do not have to use a fake proxy at all. As a matter of fact, you can create a simple proxy server that logs all external URLs to a logfile. This way you know all the external dependencies in your application. Just make sure that no matter what the request is, your proxy server returns a 200 response with an empty body. BrowserMob Proxy accomplishes just that, and it can be found at http://bmp.lightbody.net/.
· It tells the profile to not use the proxy for any connections going to localhost, 127.0.0.1, and all instances of our website
· Finally, it tells Selenium to get us a new instance of Firefox with the profile we just made
Let's run the test suite again. All the tests should be passing, like this:
When loading the web page, here is what the tests see:
The tests are now at the highest stability point they have ever been. Some flakiness will occur from time to time; this cannot be avoided. However, the more work we put in stabilizing our Selenium tests, the fewer failures we will see. At some point in the future, when the Selenium build fails, we will have confidence to say it is a real bug that caused the failure and not test flakiness.
Test your tests!
A last thought before we close this chapter: not enough time and thought is given to the idea of testing the tests themselves. One should not hurry in adding a new test to the suite without running it at least a dozen times. Personally, I tend to run each new test about 20 times before I consider it stable. Just put the tests in a loop, let it run for 20 minutes while you get a cup of coffee. You will be surprised how often a test will fail if you just let it run enough times.
Finally, don't forget to test your tests on multiple browsers. As a rule of the thumb, any test you will write will be a lot more stable in Firefox and Chrome browsers than they are in Internet Explorer and Safari. Just because you got stability in the test suite for the first two browsers, it does not mean the latter two are stable.
In this chapter, we covered the topic of test stability. We discussed some of the things that make an individual test stable—starting with cultural changes on the team and ending with changing personal behavior, such as testing our own tests before committing them to source control.
Now that we have some measure of stability in our tests, we can start spending more time thinking about the test declaration versus test implementation. In the next chapter, we will be testing the behavior of our application.