Testing - Object-Oriented Analysis and Design for Information Systems: Modeling with UML, OCL, and IFML (2014)

Object-Oriented Analysis and Design for Information Systems: Modeling with UML, OCL, and IFML (2014)



This chapter explains how object-oriented software developed with the techniques shown in previous chapters can be tested. Specifically, three levels of testing are presented: unit tests for methods and classes, system operations tests, and use case tests. The chapter focuses on the functional approach to tests based on the operation contracts. For each parameter of a system or basic operation, the designer should define if it has acceptable and/or unacceptable values. A success test must be created for each success scenario and a failure test for each set of unacceptable values. With this approach test cases may be systematically designed. Equivalence partition and limit value analysis are also presented in order to determine the best choices for the test cases.


Unit test; system operations test; system test; test-driven development; functional test

Key Topics in this Chapter

• Functional testing

• Stubs and drivers

• Test-driven development

• Unit testing

• System operations testing

• Use case testing

11.1 Introduction to testing

No matter how sophisticated the modeling and specification techniques used to develop software are, no matter how disciplined and competent the team is, there is a factor that makes software testing always necessary: human error. It is a myth to think that good developers working with state-of-the-art tools are capable of developing error-free software (Beizer, 1990).

Murphy’s Law (Bloch, 1980) in many of its corollaries seems to speak directly to the software industry. For example:

• Anything that can go wrong will go wrong.

• If everything seems all right, you have not checked appropriately.

• Nature always sides with the hidden flaw.

For many years, software-testing activities were considered a punishment to programmers. Testing was considered as a waste of time because software was supposed to be correct from the beginning.

However, things have changed. The test discipline now is considered extremely important. Today it is an integral part of the software development process and one of the disciplines of the Unified Process.

Furthermore, leading software development companies started to outsource software testing, by hiring test factories. That means that not only developers, but teams are specially trained to conduct testing.

This chapter presents test activities that are strongly adapted for use with the techniques shown in previous chapters.

Unit tests are used basically to check the behavior of classes, including their basic and delegate methods. If automatic code generation is used, such tests may be suppressed, because it is assumed that automatic code generators do not make human errors.

System operation tests are top-level functional tests that may be based on the examination of the system operation contract: its parameters, preconditions, and exceptions determine sets of valid and invalid parameters that must be systematically checked, and the postconditions determine the results that must be accomplished when valid entries are tested.

Finally, system tests are performed as systematic use case tests. Each use case is a script that establishes normal and alternative flows for accomplishing business goals. Use case tests may be executed manually or may be automated. If the client executes use case tests, they are called acceptance tests.

There are two major categories of test techniques:

Structural tests, which evaluate the internal structure of the code.

Functional tests, which evaluate operations based only on their inputs and outputs.

In this book, only functional techniques applied specifically to the three levels of testing are introduced:

• Functional unit test for basic and delegate operations.

• Functional system operation tests based on contracts.

• System and acceptance tests based on system use cases.

Readers interested in understanding automated testing in more detail should take a look at the book of Meszaros (2007), whch presents a significant set of xUnit style test patterns, a family of frameworks for innumerous languages based on the design originally created by Kent Beck (1989).

11.2 Functional testing

Functional testing consists of a sequence of tests that define entry values for an operation and observe if the result is what was expected. Functional tests may be run without any knowledge of the programming code that implements the operation; only its behavior is observed. The quantity of tests to be conducted in order to assure that an operation is correct may be virtually infinite. Functional testing may use techniques to reduce the number of necessary tests without losing coverage. The most useful techniques for accomplishing that goal are equivalence partitioning and limit value analysis, which are explained in the following subsections.

11.2.1 Equivalence partitioning

One of the principles of functional testing is the identification of equivalent situations. For example, if an operation accepts a set of data (normality) and rejects another set (exception) then it may be said that there are two equivalence sets1 of input data for that operation: accepted and rejected values. It is usually impossible to test every value included in those sets, because they may be virtually infinite. Thus, the technique of equivalence partitioning indicates that at least one element in each equivalence set must be tested (Burnstein, 2003).

Classically, the equivalence partitioning technique considers the division of the inputs through the following criteria (Myers, Sandler, Badgett, & Thomas, 2004):

• If the valid values are specified as an interval (for example, from 10 to 20), then we define one valid set (10 to 20) and two invalid sets (less than 10, and greater than 20).

• If the valid values are specified as a quantity of values (for example, a list with five elements) then we define a valid set (list with five elements) and two invalid sets (lists with less than five elements, and lists with more than five elements).

• If the valid entries are specified as a set of acceptable values that may be processed in different forms (for example, the strings of an enumeration such as “male” and “female”), then we define a valid set for each of the valid options and an invalid set for any other value.

• If the valid values are specified by a logical condition (for example “final date must be greater than initial date”) then we must define a valid set (when the condition is true) and an invalid set (when the condition is false).

The sets of valid inputs may be defined not only in terms of restrictions on the input data but also in terms of the results that may be produced. If the operation that is being tested has different behavior depending on the value of the input, then different valid sets must be defined for each behavior. For example, consider a simple operation half(x:Integer):Integer. This operation accepts that x may be any positive number. Consider also that the operation is defined so that it cannot accept zero or negative numbers. Finally, consider that the operation produces (x−1)/2 for odd numbers and x/2 for even numbers. Thus, we must consider that half has three equivalence sets for the parameter x:

• A valid set composed of odd positive integers: 1, 3, 5, ….

• A valid set composed of even positive integers: 2, 4, 6, ….

• An invalid set composed of nonpositive integers: 0, −1, −2, −3, ….

The equivalence set values must be always restricted to the domain defined by the type of the parameter. For example, if the type of the parameter were something other than Integer, then different equivalence sets would be defined. If the operation was half(x:Natural):Integer, then the equivalence sets should be redefined so that they remain inside the Natural domain, which does not include zero and negative numbers:

• A valid set composed of odd positive integers: 1, 3, 5, ….

• A valid set composed of even positive integers: 2, 4, 6, ….

• No invalid sets.

On the other hand, assume the operation is defined with a broader parameter type such as half(x:Real):Integer; then the equivalence sets would be:

• A valid set composed of odd positive integers: 1, 3, 5, ….

• A valid set composed of even positive integers: 2, 4, 6, ….

• An invalid set composed of nonnatural numbers: image.

We can see from the examples above that if the designer restricts the type of the parameter as much as possible, then fewer or simpler invalid equivalence sets are defined. That is why examples in previous chapters use the Natural type instead of Integer whenever zero or negative numbers could not be accepted. And that is why enumerations should be used rather than unrestricted strings whenever possible.

11.2.2 Limit value analysis

People that work in software testing used to say that (software) bugs hide in the slits. Due to this, the equivalence partitioning technique is usually used in conjunction with another technique known as limit value analysis.

Limit value analysis consists of not choosing any random value from an equivalence set, but choosing two or more limit values if they can be determined.

In ordered domains, such as numbers, this criterion may be applied. For example, if an operation requires an integer parameter that is valid only if included in the interval [10..20], then there are three equivalence sets:

• Invalid for any x<10.

• Valid for x≥10 and x≤20.

• Invalid for any x>20.

Limit value analysis suggests that eventual errors in the logic of a program are not at arbitrary points inside those intervals but at the boundary points where two intervals meet. Thus if the domain of the parameter is Integer:

• For the first invalid class, the value 9 must be tested.

• For the valid class, 10 and 20 must be tested.

• For the second invalid class, the value 21 must be tested.

Thus, if there is a logic error in the operation for some of these inputs it is much more probable that it will be found in this way rather than if a random value is chosen inside each interval that defines an equivalence set.

11.3 Stubs and drivers

Frequently, parts of the software must be tested separately from the main body of code, but at the same time they must communicate with other parts.

When a component A that is going to be tested calls operations from a component B that has not yet been implemented, component B may be replaced by a simplified version of it that implements only the behavior that is absolutely necessary to perform the test of component A. This simplified implementation that is used in the place of a component that has not yet been implemented is called a stub.

For example, suppose that a class A needs a prime number generator B that has not yet been implemented. The nth prime number would be generated by a function prime(n:Natural):Natural. A simplified version of B may be implemented just for allowing class A to be tested. Suppose that the first five prime numbers are enough to adequately test class A. Thus, a stub for B could be implemented like this:


STUB METHOD prime(n:Natural):Natural






5: RETURN 11

OTHWERWISE Exception.throw(‘Stub implementation

requires argument less or equal to 5’)




In this way, class A may be tested without investing time in implementing a more complex function in class B.

On the other hand, operations implemented in class A will be called by other components of the system. But when class A is being tested, these components may not yet exist. Furthermore, if we are testing class A we usually desire that that test occurs in a systematic and reproducible way. In order to assure this, a test driver for class A may be implemented. The driver is a component of code (possibly a new class) that aims uniquely to systematically test a component.

For class A mentioned above, a driver could be implemented as a new class named Driver4A, for example. An example of a driver is shown in Section 11.5.

Thus, stubs and drivers are simple implementations that simulate the behavior of other components. The stub is used in the place of a component that should be called but is not yet implemented. The driver is used in the place of a component that should call the component to be tested in a systematic and reproducible way.

Stubs are usually throwaway code. When the real component is implemented the stub is not necessary anymore and may be discarded.

On the other hand, drivers are important pieces of code that must be kept forever. Every time a component must be tested – and it happens frequently during software development and evolution – the driver would be available for automatically performing the test.

11.4 Test-driven development

Test-driven development or TDD (Beck, 2003) is a technique and a programming philosophy that incorporates automatic testing to the process of producing code. It works like this:

1. First, the programmer that receives the specification for a new functionality that must be implemented should create a set of automatic tests for the code that does not yet exist.

2. This set of tests must be executed and observed to fail. This is done to show that the tests won’t succeed unless a new feature is implemented in the code. If the test passes at this time there are two explanations: either the test is badly written or the feature that is being tested is already available in the code, supposing that it is an update on existing code that is being produced and not new code from scratch.

3. Then, the code must be developed with the only objective being passing the tests. If any other feature is detected, then the tests must be updated before it is introduced in the code. In that case, the process restarts from step 1.

4. After the code passes all tests it must be cleaned and improved if necessary in order to meet internal quality standards. After passing final tests it is considered stable and may be integrated into the main body of code of the system.

The contracts that were developed in Chapter 8 and the dynamic models of Chapter 9 are exceptional sources of information for developing complete and consistent test cases before producing any code, as explained in the following sections.

The motivation for TDD is to incentivize the programmer to keep the code simple and to produce a test asset that allows any part of the system to be automatically tested.

11.5 Unit testing

Unit tests are the most elementary tests and consist of verifying if an individual component (unit) of the software was implemented correctly. That component may be a complex method, or an entire class. Usually, this unit is still disconnected from the system it will be integrated with later.

Unit tests are usually performed by the programmer and not by the test team. TDD requires that the programmer develop the test code before writing the code to be tested. This test code is a driver that must systematically generate the set of data necessary to test all valid and invalid equivalence sets, applying limit value analysis when possible.

An example of unit testing consists of checking if a method was correctly implemented in a class. Take a look again at Figure 10.13 and delegate method 3.2, incrementItem(aBook:Book; aQuantity:Natural), which must be implemented in the Cart class.

Now let us consider what the equivalence sets are for these parameters. First let us consider the parameter aBook:Book. As the type of the parameter is Book, it can be assumed that no other kind of object could be passed as an argument. For example, this operation would never receive an instance of Customer or Delivery instead of a Book2 because the language typing mechanisms would avoid that. The team must be concerned about whether objects that can be received here are valid or not.

If the language mechanisms cannot prevent a method from receiving a null value as a parameter, then the null case would always be considered a (usually invalid) equivalence set for any parameter that is typed with a class name.

The incrementItem method was developed with the following goal: to increment the quantity of an item that is already in the cart. We may start by considering that there are three equivalence sets for the aBookparameter:

• A valid equivalence set that contains instances of Book that are linked to an item in aCart.

• An invalid equivalence set that contains instances of Book that are not linked to any item in aCart.

• An invalid equivalence set that contains only the null value.

The second parameter of incrementItem is a natural number identified as aQuantity. The operation assumes that the existing quantity of the item already in the order will be incremented by aQuantity. The Natural type does not include zero and thus that potentially invalid quantity is not allowed as an argument. However, ordering a number of copies of a book that exceeds its quantity in stock could be considered an error condition unless the system allows ordering books that are not yet available. Assuming that the system operation add2Cart that calls incrementItem has not defined as a precondition that the order quantity is available in stock (see its contract in the next section), then the incrementItem operation may receive invalid quantities regarding that business rule. Therefore, the following equivalence sets may be considered for the aQuantity:Natural parameter:

• A valid equivalence set that includes natural values that added to the quantity of the respective item produce a result that is less or equal to the quantityInStock attribute of aBook.

• An invalid equivalence set that includes all natural values that added to the quantity of the respective item are greater than the quantityInStock attribute of aBook.

The potential test cases for incrementItem are combinations of the equivalence sets for the two parameters, as seen in Table 11.1.

Table 11.1

Combinations of Equivalence Sets for IncrementItem


There is only one success condition, which is obtained when valid values are obtained for both parameters. It happens when aBook is linked to an item in the cart and aQuantity plus the current quantity is less than or equal to the quantity in stock. In OCL that condition can be expressed in the context of aCart as

self.item.book->include(aBook) and


Although there are five combinations in Table 11.1 that produce failure, only three have to be tested because if the book is not linked to any item or if the book is null, the value of aQuantity cannot be determined to be valid or invalid because there is no current quantity to be compared to it. Usually the rules to obtain combinations for testing are:

• Get all the success combinations and define a test case for each of them.

• Get one invalid set at a time. Even if the operation has n parameters, consider only one invalid set for just one parameter at a time. The other parameters should be valid or it should be impossible to determine their validity.

• Only take a combination with more than one invalid set if that combination is possible and useful.

Therefore, there are three failure tests that must be performed to determine how incrementItem behaves with invalid parameters. The first happens when aBook is not in aCart. The second happens when aBookis null. The third happens when the desired quantity surpasses the quantity in stock:

(1) not self.item.book->includes(aBook)

(2) aBook.isNull()

(3) self.item->select(book=aBook).quantity+aQuantity >

In order to avoid tests that leave side effects on the system, they are usually performed in three steps:

1. Prepare the stage for the test. Usually special objects called fixtures are created with the sole purpose of allowing the test to be performed.

2. Execute the test.

3. Clean the environment. The system must return to the state it was in prior to the test. Fixtures must be disposed of at this point.

Let’s consider first the success test scenario for incrementItem. Any fixture created for this test should evaluate true for the success condition mentioned above. An example of code for generating such fixtures is the following:

CLASS Driver4Cart

METHOD testIncrementItemSuccess()





’Bring me the head of Willy the mailboy!’, ’Scott Adams’,

US$12.30, 128, 5) -- quantity in stock is 5.

aCart:=Cart.Create() -- cart id is assigned internally


As we are going to test the Cart class it is possible that some of its methods or methods from other classes such as Book and Item are not yet implemented. In that case, stubs could be used here to allow the test on incrementItem to be performed.

After the initialization code, the test itself may be performed:

aCart.incrementItem(aBook,2) -- using limit value analysis

IF anItem.getQuantity()=5 THEN

Write(‘incrementItem success test: successful’)


Write(‘incrementItem success test: failed’)


Finally, we must dispose of the fixtures:





Now, the failure conditions should be tested as well. Here, they are implemented as three independent methods in the Driver4Cart class:

METHOD testIncrementItemFailure1 ()

-- not self.item.book->includes(aBook)




’Bring me the head of Willy the mailboy!’, ’Scott Adams’,

US$12.30, 128, 5)

aCart:=Cart.Create() -- the book is not in the cart



Write(‘incrementItem failure test 1: failed’)

CAPTURE Exception.itemAbsent

Write(‘incrementItem failure test 1: successful’)





METHOD testIncrementItemFailure2 ()

-- aBook.isNull()



aCart:=Cart.Create() -- the book is null



Write(‘incrementItem failure test 2: failed’)

CAPTURE Exception.bookIsNull

Write(‘incrementItem failure test 2: successful’)




METHOD testIncrementItemFailure3 ()

-- item->select(book=aBook).quantity+aQuantity > aBook.quantityInStock





’Bring me the head of Willy the mailboy!’, ’Scott Adams’,

US$12.30, 128, 5) -- quantity in stock is 5.

aCart:=Cart.Create() -- cart id is filled internally



aCart.incrementItem(aBook,3) -- limit analysis

Write(‘incrementItem failure test 3: failed’)

CAPTURE Exception.quantityAboveStock

Write(‘incrementItem failure test 3: successful’)







The style of the code of the test case depends on the language and framework used. Here a generic pseudocode implementation is illustrated to allow the logic of the test to be understood by readers with knowledge about programming languages but not necessarily experience with test frameworks.

As we can see in the code above, fixtures and test procedures are usually similar, and thus uniform and parameterized methods could be used to create such objects, reducing the quantity of code that is being produced. This is usually provided by frameworks such as xUnit.

Unit tests may be extremely minimized if an automatic code generator is used to generate the basic operations and delegate methods. In that case, it may be assumed that the code is correct because the generator does not make human mistakes. The test could begin at the level of system operations, which is explained in the next section.

11.6 System operations testing

The top level of functional testing occurs when the team verifies if the system operations really perform according to the contract specifications.

System operations may call methods from many classes, delegate responsibilities, and coordinate the execution of basic and delegate methods. Thus, at the level of integration of the classes that perform a consistent set of responsibilities, the system operation test verifies if each system operation correctly implements its contract.

The system operations test, then, consists of verifying if in normal conditions (preconditions are true and exceptions false), the desired postconditions are really obtained, and if in abnormal conditions, exceptions are effectively raised. System operation tests are very similar to unit tests, except that now the operation has a contract, and this eases the identification of the necessary test cases.

Let us now examine the system operation add2Cart in Figure 10.13. Its contract was defined in Chapter 8 and is the following:

Context Livir::add2Cart(aCartId:CartId, anIsbn:Isbn, aQuantity:Natural)





aCart->notEmpty() and



if anItem->isEmpty() then

newItem.isNewInstanceOf(Item) and

aCart^addItem(newItem) and

newItem^setQuantity(aQuantity) and

newItem^addBook(aBook) and



anItem^setQuantity(anItem.quantity@pre + aQuantity)



anItem.quantity+aQuantity > aBook.quantityInStock implies

Exception.throw(‘quantity not available’)

The precondition aCart->notEmpty() implies that there is an invalid set for the parameter aCartId. The precondition aBook->notEmpty() implies that there is an invalid set for the parameter anIsbn. Finally, the exception in that contract establishes that there are invalid values for aQuantity as well.

Regarding valid values, each parameter must have at least one valid equivalence set, otherwise the method would always fail. However, as the postconditions are conditional (they include an if-then-else-endifstructure), there are at least two possible behaviors for this operation, and this defines two valid equivalence sets: one when anItem is empty and another when anItem is not empty. The test cases for this system operation are presented in Table 11.2.

Table 11.2

Functional Test Cases for a System Operation


Condition to Be Tested

Expected Result


aCart->notEmpty() and
aBook->notEmpty() and
anItem->isEmpty() and
aQuantity <=



aCart->notEmpty() and
aBook->notEmpty() and
anItem->notEmpty() and
anItem.quantity+aQuantity <=




‘There is no such cart’



‘There is no such book’


anItem->notEmpty() and
anItem.quantity+aQuantity <=

‘Quantity not available’

Combinations of exceptional conditions could be tested as well. For example, one situation that does not appear in Table 11.2 is when both the ISBN and cart ID are invalid. If that is the case, just one of the exceptions would be raised in any case. Furthermore, as mentioned before, sometimes invalid conditions are not compatible. For example, if the ISBN is invalid, there is no quantityInStock to be compared to the ordered quantity, and thus the corresponding exceptions could never occur together.

This is usually why the only combinations that are tested are the ones that just include success conditions or just one failure condition combined with success conditions. However, if the team finds it easy and safer to add new variations that include combinations of invalid failure conditions that may be implemented.

Thus, although not every combination is tested, it is necessary to test at least:

• All possible combinations of valid equivalence sets.

• All combinations of one invalid equivalence set with valid sets.

This means that after testing the valid combinations, the tester must include one invalid set at a time for the exception test.

The code for testing system operations is very similar to the one presented for unit tests: fixtures must be created before each test, success and failure tests are run, and the environment is cleaned after each test.

11.7 Use case testing (System, acceptance, and business cycle tests)

Use case tests consist of manually or automatically following all the possible flows of a use case and checking if the desired results (usually stated as the use case postconditions) are obtained.

If the test is run over a single use case by the development team, then it is a system test. If the test is run by the client or under her supervision, then it is an acceptance test. If a whole set of system use cases related to a business use case are performed in a logical sequence such as the one defined by the business activity diagram, then it is a business cycle test.

The use case test aims to verify if the current version of the system correctly performs the use case processes from the point of view of a user that performs a sequence of system operations in an interface (not necessarily a graphic interface) and is able to obtain the expected results.

The use case test may be understood as a systematic execution of the use cases flows. If each of the system operations and queries has already passed their own tests, then it must be verified that the main flow and alternate flows of the use case can be performed correctly and that they produce the desired results.

It is possible to automate these tests so that they can be performed by a test robot that makes calls directly to the controller or simulates the actions of the user on the graphic interface.

Consider, for example, a system test that must be conducted to evaluate the use case of Figure 5.7, reproduced here as Figure 11.1.


FIGURE 11.1 Reference use case.

A use case test would deal with scenarios that are successful but run through different paths. First of all, the main flow of the use case must be considered as a base scenario for test purposes. Then, each variant or exception may be included one at a time for testing alternate scenarios. This is the least that is expected of a use case test suite. Combinations of exceptions and variants may be tested as well if the team considers them useful and viable.

The technique for elaborating use case tests then consists of selecting a set of paths that passes at least once by each variant and exception flow. The test cases for the example of Figure 11.1 may be defined as the following:

• 1,2,3,4,5.

• 1,2,3a(1,2,3),4,5.

• 1,2,3,4,5a(1).

• 1,2,3,4,5b(1).

• 1,2,3,4,5c(1),1,2,3,4,5.

• 1,2,3,4,5d,1,2,3,4,5.

The system test plan for the use case of Figure 11.1 is shown in Table 11.3.

Table 11.3

Example of a Test Plan for a Use Case


The use case test is only performed on a system version in which all system operations were already tested. If many errors are still detected in system operations during the system test, the usual approach is to abort the use case test and redo the system operation tests until a sufficiently stable version of the system is produced and made available for use case tests.

Although automatic use case testing3 may check if the functionality is correct or not, human-powered use case tests are also recommended because there are aspects of the test that are hard for a machine to assess. For example, are the error messages clear? Are the buttons and text fields well positioned and sized?

Furthermore, a whole set of nonfunctional tests may be necessary depending on the supplementary specifications and nonfunctional requirements of the system (Chapter 3). Examples of nonfunctional tests are compatibility, compliance, endurance, load, localization, performance, recovery, resilience, security, scalability, stress, and usability. Explaining all these kinds of tests is beyond the scope of this book. Nonfunctional testing demands heterogeneous techniques and are covered by a large number of books such as: Molyneaux (2009) on performance testing, Nelson (2004) on stress testing, and Nielsen (1994) on usability testing.

11.8 The process so far




11.9 Questions

1. Consider the methods defined in Section 10.2 for the class in Figure 10.1, and write a sequence of unit tests for each of the basic operations defined for that class.

2. Look at the four contracts for the deleteBook operation in Section 8.7.3. Write a system test in your favorite language or in pseudocode to test each of the versions of that operation.

3. Prepare a use case test table for the use case in Figure 5.4.

1Although literature usually uses the term equivalence classes, which is a mathematical concept, in this book the term equivalence sets is used instead to avoid becoming confused with object-oriented classes.

2Unless they were subclasses of Book, which is not the case.

3An example of a robot framework for interface testing is: http://robotframework.googlecode.com/hg/doc/userguide/RobotFrameworkUserGuide.html?r=2.7.4. Accessed September 1st, 2013.