Introduction to Collections - Wrox Press Java Programming 24-Hour Trainer 2nd (2015)

Wrox Press Java Programming 24-Hour Trainer 2nd (2015)

Lesson 11. Introduction to Collections

So far you’ve been introduced to only one way of storing a collection of objects—with Java arrays, which are good for storage but fall short when you need to dynamically add, remove, or traverse the data. There are a number of classes and interfaces in the package java.util that are quite handy when multiple instances of some objects (collections) have to be co-located in memory. This lesson introduces you to several of them.

You can find more collections in the java.util.concurrent package, but you review those in digging_deeper_into_concurrent_execution after you become familiar with the concept of multithreading. Together, the collection classes and interfaces located in java.util and java.util.concurrent are often called Java Collection Framework.

Collection classes implement different interfaces, and several are covered in this lesson. The image in Figure 11-1 is taken from Oracle Java documentation. It depicts the top-level core collection interfaces.

image

Figure 11-1: Figure 11-1. Core Collection Interfaces

Java 8 introduced substantial improvements in collection data manipulation, and I highlight these changes in this lesson as well as in Lesson 20.

Arrays Revisited

Java collection classes enable the storing of primitives or object references in one place in memory. You were introduced to arrays in Chapter 5: Arrays let you store and access a group of variables of the same type. Let’s go over the steps you follow to declare and populate an array.

First, declare a variable of the type that matches the types of the objects (or primitives) that will be stored in the array, and reserve enough memory to accommodate all objects. For example, to reserve memory enough for storing 10 instances of class Customer you can declare a variablecustomers, as follows:

Customer customers[] = new Customers[10];

At this point you’ve allocated enough space for the storage of 10 memory references, not for the actual objects. Next, create instances of the objects and store their references in the array.

Customer[] customers = new Customer[10];

customers[0] = new Customer("David","Lee");

customers[1] = new Customer("Ringo","Starr");

...

customers[9] = new Customer("Lucy","Mann");

Now give a 15 percent discount to all customers who spent more than $500 in the online store:

for (Customer c: customers){

if (c.getTotalCharges() > 500){

c.setDiscount(15);

}

}

Note the use of the for-each loop here. It safely iterates through this array without trying to access elements beyond its boundaries. If a programmer forgot to check the size of the array (remember, if an array has n elements then the last element’s index is n - 1) and tried to access, say, the eleventh element like customers[10].setDiscount(15), Java would throw a runtime ArrayIndexOutOfBoundsException.

Collection Interfaces From java.util

A typical collection class implements several interfaces, which represent a well-designed hierarchy. For example, ArrayList implements the List interface, which extends Collection. For allowing a program to iterate over the collection without worrying about specific implementation of a particular collection. The interface Collection extends Iterable, so the application code can request a reference to the Iterator object and simply ask for the next element in the collection. You’ll see an example of using Iterator later in this chapter.

You can use a for-each loop with classes that implement Iterable, and this would be external (a.k.a. imperative) iteration of a collection. But Java 8 has introduced a new and preferable way of iterating collections with Iterable.forEach() method, which would be an internal iteration. I’ll explain why internal iteration is better in Lesson 13.

The List interface is used by the ordered collections like ArrayList and LinkedList. It allows duplicate elements. For example, the following code snippet will create two elements in ArrayList:

myArrayList.add("Mary");

myArrayList.add("Mary");

The Set interface is implemented by collections that don’t allow duplicate elements—for example, HashSet and SortedSet. For example, the following code snippet creates one element in HashSet. The second line finds out that Mary already exists, doesn’t change it, and returns false:

myHashSet.add("Mary");

myHashSet.add("Mary"); // Returns false

The Map interface is for storing key/value pairs. A map can’t contain duplicate keys, and each key can be mapped to only one value (object). You see some relevant code examples later in this lesson.

The Queue interface is mainly for collections that require first-in-first-out (FIFO) operation (so-called priority queues are the exception). Every new element is added to the tail of the queue and the elements are retrieved from the head of the queue. You can restrict the size of the queue if need be. LinkedList is one of the classes that implement the Queue interface.

Now let’s look at some of the classes that implement these interfaces.

Dynamic Arrays with ArrayList

Arrays offer the fastest access to the collection of data, but you have to know in advance the number of elements to be stored there. Luckily Java has classes that don’t have this restriction, and you can add more elements to a collection during the run time if needed. This lesson shows you several collection classes starting from ArrayList .

Internally this collection uses an array for storage, but when you keep adding elements to ArrayList, it increases (by small increments) the size of the underlying array. Correspondingly, as elements are deleted, this collection shrinks in size. You can store duplicate objects in ArrayList.

ArrayList implements the List interface and can store only objects; primitives are not allowed. Having said this, keep in mind that Java supports autoboxing (see Chapter 3), and if you try to add a primitive to a collection, it is automatically converted into the corresponding wrapper object. You have to pay a price for this convenience; ArrayList is a little slower than the array as it needs to do internal copying from one array to another to change the collection’s size.

To create and populate an ArrayList object you should first instantiate it, and then you add instances of other objects to the ArrayList by calling the method add(), as in Listing 11-1.

Listing 11-1: Populating ArrayList

ArrayList customers = new ArrayList();

Customer customer1 = new Customer("David","Lee");

customers.add(customer1);

Customer customer2 = new Customer("Ringo","Starr");

customers.add(customer2);

The method add() doesn’t copy the instance of the object into the customers collection, it just stores the memory address of the object being added. The element numbering in ArrayList starts with 0. If you know that an ArrayList will store, say, 20 objects, instantiate it with the constructor that allocates the right amount of memory on creation:

ArrayList customers = new ArrayList(20);

You can still add more than 20 elements, but JVM allocates additional memory as needed. The method get() is used to extract a particular element from ArrayList. Because ArrayList is generic storage for any type of object, the method get() returns elements as Object data types. It’s the responsibility of the programmer to provide proper casting, such as the following:

Customer theBestCustomer = (Customer) customers.get(1);

To illustrate a possible runtime error that will occur if the casting was not properly done, add an object of another type to your customers collection from Listing 11-1:

Order order = new Order(123, 500, "IBM");

customers.add(order);

The Java compiler does not complain because ArrayList can store any objects. At this point you have the elements in customers—two customers and one order. The following code throws the IllegalCastException on the third iteration of the loop:

int totalElem = customers.size(); // number of elements

for (int i = 0; i < totalElem; i++){

Customer currentCustomer = (Customer) customers.get(i);

currentCustomer.doSomething();

}

Listing 11-2 shows how the operator instanceof helps you avoid this exception. But before using instanceof, see if you can come up with a more elegant solution, as you learned to do in the section “Polymorphism” in Chapter 7. In Lesson 12 you’ll learn how to use generics, which allow to remove the need of using instanceof and in particular control during the compilation time which objects can be added to a collection.

Listing 11-2: ArrayList and instanceof

ArrayList customers = new ArrayList(3);

// The code to populate customers with instances of

// Customer and Order objects is omitted for brevity

int totalElem = customers.size();

// Iterate through the list customers and do something with each

// element of this collection

for (int i=0; i<totalElem;i++){

Object currentElement = customers.get(i);

if (currentElement instanceof Customer){

Customer currentCustomer= (Customer)customers.get(i);

currentCustomer.doSomething();

}

else if (currentElement instanceof Order){

Order currentOrder = (Order) customers.get(i);

currentOrder.doSomething();

}

}

In the section Programming to Interfaces I’ll show you what has to be changed in the declaration of the variable customers.

ArrayList and Concurrent Access

In Lesson 17 you learn about concurrent access to the data from multiple threads. If the data is stored in an ArrayList, you may run into concurrency issues (race condition). To prevent this from happening, you can turn on synchronization by invoking the methodCollections.synchronizedList() on an ArrayList object.

Programming to Interfaces

In this section code samples start with declaring a variable of type ArrayList; for example:

ArrayList customers = new ArrayList(3);

While this code is correct, there a better way of declaring the variable customers:

List customers = new ArrayList(3);

You can read the first example as follows: “I want to declare a variable customers that will have all access to all API offered by the class ArrayList.” The second version means the following: “I want to declare a variable customers that has a behavior declared in the Listinterface”.

The first example declares a variable of a specific implementation— ArrayList—of the List interface.

ArrayList implements several interfaces besides List, which means that it has more methods that the List defines. But if you read the documentation on the List interface, you’ll see that among others it includes the methods as add(), get(), and size(), which are the only ones used with our collection customers. If this is all we need, declaring a variable customers of type List gives us more flexibility. If later we decide to switch to a different implementation of the List (e.g., LinkedList instead of ArrayList ) we won’t need to change the type of the variablecustomers.

You may say that changing a variable declaration from ArrayList to LinkedList it’s not a big deal—it’s still the same line of code. But it may be a bigger deal if, say, your program needs to pass the object referred by customers to another object’s method that also was declared with the argument of type ArrayList. Now we need to make changes in two places. In large projects such a refactoring may become a time-consuming process.

If you just need a behavior defined in a particular interface, declare the variable of this interface type rather than of a concrete implementation of this interface.

Classes Hashtable and Hashmap

The classes Hashtable and HashMap implement the Map interface and stores key/value pairs. These classes offer a convenient way of storing and accessing the elements of a collection by key. You can assign a key to an instance of some Java object and use it as a reference. Let’s say you need to store instances of the classes Customer, Order, and Portfolio in the same collection. The code snippet from Listing 11-3 creates these instances first, and then puts them in the collection under some identifiers (keys).

Listing 11-3: Hashtable for key/value pairs

Customer customer = new Customer("David", "Lee");

Order order = new Order(123, 500, "IBM");

Portfolio portfolio = new Portfolio(123);

Map data = new Hashtable(); // programming to interfaces

data.put("Customer", customer);

data.put("Order",order);

data.put("Portfolio", portfolio);

The values in double quotes represent keys by which the objects could be retrieved. In this example the keys are represented by the Java class String, but you can use any objects as keys. The keys are selected based on the application needs; for example, the code in Listing 11-3 could be written in some order management application.

If you have an idea of how many elements you are planning to store in a Hashtable, use the constructor with the capacity argument:

Hashtable data = new Hashtable(10); // 10-element capacity

The method get() provides access to these objects via the key. You need to either perform the proper casting as shown in the following code or use generics (explained in Chapter 12):

Order myOrder = (Order) data.get("Order");

The method size() returns the number of elements in the Hashtable:

int totalElem = data.size();

Methods containsKey() and containsValue() help you to find out if the collection contains a specific key or value.

The class HashMap is similar to Hashtable, but it allows null as a key or value and is not synchronized (explained in Lesson 19). If you are writing code that doesn’t need to access the same element concurrently without using multithreading, use HashMap because it performs faster thanHashtable. If you do need concurrent access, the other alternative to Hashtable is ConcurrentHashMap.

To speed up the table lookup, both HashMap and Hashtable index the data by applying a hash function that (based on the contents of the object) generates a hash code, one number that represents a large object. There’s a slight chance that two different objects will generate the same hash code, but the same object always produces the same hash code. You can read more about hash functions in Wikipedia at http://en.wikipedia.org/wiki/Hash_function.

The java.util.Collections class

The class java.util.Collections consists of useful static methods that work with collection classes; for example, sort(), reverse(), swap(), and more.

Class Properties

Pretty often a desktop application offers you a way to specify and store user preferences such as fonts and colors. This is a use case in which storage of key/value pairs is exactly what’s needed. You can store such preferences locally or on remote servers. In Chapter 14 you find out how to work with files and other I/O streams, but from a data structure perspective you’ll be dealing with a collection of key/value pairs, such as color=red, font=verdana.

Windows-based applications often store some configurable parameters in the .ini files. In general, Java applications store their properties in plain text files, XML files, database tables, and others.

In this section you see some code fragments illustrating how the Java class Properties, which extends Hashtable, can be used to manipulate with properties using key/value pairs. The class Properties has one restriction that Hashtable does not. For example, if you’d need to write a program that sends e-mails, you can store the URL of the mail server and from/to addresses from the mailman.properties file, which has the following contents:

SmtpServer=mail.xyz.com

to=abc@xyz.com

cc=mary@xyz.com

from=yakov@xyz.com

To load this file into the Properties object, just define an input I/O stream on this file (see Listing 11-4. After the file has been loaded into the Properties object, each individual property can be obtained with the method getProperty().

Listing 11-4: Reading file mailman.properties into the Properties object

Properties properties=new Properties();

FileInputStream in =null;

try{

in = new FileInputStream ("mailman.properties");

properties.load(in);

}catch(Exception e){...}

finally{... in.close();...}

String from = properties.getPropery("from");

String mailServer=properties.getProperty("SmtpServer");

...

Java does not have global variables, but as a workaround you can make these properties available to any object in your application by turning them into system properties available from any class in your application:

System.setProperties(properties);

Keep in mind that the preceding line also replaces the existing system properties, which you may or may not want to do. Now you can get these values from any other class in your application; for example:

String mailServer = System.getProperty("SmtpServer");

If you decide to store properties in XML files, the class Properties offers you the method loadFromXML() to read properties and the method storeToXML() to store them in a simple XML format.

You see a practical example of using the Properties class in Lesson 29 while learning about Java Naming and Directory Interface (JNDI).

Classes Enumeration and Iterator

In general, enumerations are sets of items that are related to each other. For example, shipment options or ice cream flavors—such enumerations are supported by the Java keyword enum (see Chapter 17). But because we are talking about collections, the meaning of the term enumeration is somewhat different. If a collection object implements the interface Enumeration, you can traverse its elements sequentially, without even knowing the total number. You just need to obtain the enumeration of all elements and use the methods hasMoreElements() and nextElement(). For example, to process all elements of ArrayListcustomers you can do the following:

Enumeration enumCustomers = Collections.enumeration(customers);

while(enumCustomer.hasMoreElements()){

Customer currentCustomer = (Customer)enumCustomer.nextElement());

currentCustomer.doSomething();

}

You can also obtain the enumeration of a Hashtable’s keys or elements. For example:

Hashtable customerData = new Hashtable();

// Get the keys

Enumeration enumKeys = customerData.keys();

while(enumKeys.hasMoreElements()){

// do some keys processing

}

// Get the elements

Enumeration enumElements = customerData.elements();

// do some customer objects processing

The Iterator interface is Enumeration on steroids. It also offers a standard way to process elements of a collection sequentially. The main difference between the two is that Enumeration is a read-only means of traversing a collection, whereas Iterator has a method called remove() that enables you to delete unwanted elements of the collection. Enumeration is considered a legacy interface, and you should use Iterator. For example, you can iterate through the ArrayList customers as follows:

Iterator iCust = customers.iterator();

while (iCust.hasNext()){

System.out.println(iCust.next())

}

Class LinkedList

Java collection classes differ in how you can retrieve and insert objects. If you need to work with a sequential list of objects and often insert (remove) the object into (from) the list, the data structure called linked list can fit the bill.

Linked lists store elements so that each contains a reference to the next one. Doubly linked lists also contain a reference to both the next and the previous elements. Java includes the the doubly linked class LinkedList, which enable you to create the data structures known as queues (first-in-first-out or FIFO) and stacks (last-in-first-out or LIFO).

Insertion of a new object inside the list comes down to a simple update of two references: The previous element of the list has to be pointed to the newly inserted object, which has to include a reference to the next element, if any. Compare this to the complexity of lots of memory allocations and objects moving in memory to increase the size of an ArrayList, and you’ll appreciate the value that linked lists bring to the table. On the other hand, collections that use arrays for the underlying data storage offer random access to the data elements, whereas linked lists can be processed only sequentially.

You can navigate through the list using the class ListIterator, which supports going through the list in both directions via its methods next() and previous(). Listing 11-5 shows you an example, in which a standby passenger list is created at the boarding gate of some airline company.

Listing 11-5: LinkedList example

import java.util.LinkedList;

import java.util.ListIterator;

public class TestLinkedList {

public static void main(String[] args) {

LinkedList passengerList = new LinkedList();

passengerList.add("Alex Smith");

passengerList.add("Mary Lou");

passengerList.add("Sim Monk");

ListIterator iterator = passengerList.listIterator();

System.out.println(iterator.next());

System.out.println(iterator.next());

System.out.println(iterator.next());

}

}

The code in Listing 11-5 iterates and prints all the objects from the list using ListIterator interface, which allows a program to traverse the list in both directions.

You might be wondering how the println() method knows how to print an object returned by the iterator. It tries to find the method toString() defined on the object and call it. In our example the object is a string itself, but in a real-world situation you might need to print objects, and defining the toString() method is the right way to do so.

If you use add() or remove() while iterating through the list, the new element is either inserted or removed at the iterator’s current position.

Class BitSet

The class BitSet stores a sequence of bits. It’s a pretty efficient class when you need to pass to a program a number of flags that indicate certain conditions. Think of a financial trading application that must be extremely fast. One way to improve the performance is to represent the maximum amount of information in a minimal number of bytes.

Another use case for BitSet are programs that send signals with information about the state of a certain device or as sensor. For example, some vending machines have smart chips that can automatically dial their owner’s phone number and send a signal containing status information. Sending a set of flags (bits that are set to 1 or 0) instead of text or numbers is the most economical way to do this.

The BitSet class does not have a size limit and can grow as needed. Depending on which bit is set (for example, has the value of 1) the class could indicate the following:

· Bit 0: The coin box is empty.

· Bit 1: The coin box is half full.

· Bit 2: The coin box is full.

· Bit 3: The coin box has been removed.

· Bit 4: The Coca-Cola row is empty.

One instance of a BitSet object carries multiple parameters describing its status. The program that receives this signal could print a nice report, and the owner of this remote machine could decide if he or she needs to send a technician to look at the machine.

The Java class BitSet is nothing more than a collection of bits. The following code prepares a signal indicating that the coin box is full and there are no Coca-Cola bottles left:

import java.util.BitSet;

class VendingMachineSender {

public static void main(String args[]){

BitSet report = new BitSet();

report.set(2); // box is full

report.set(4); // no Coca-Cola

}

}

When the phone call comes in, the callback method phoneRinging() is invoked and the signal can be decoded like this:

import java.util.BitSet;

class VendingMachineListener {

public void phoneRinging(BitSet signal)

int size = signal.size();

for (int i=0;i<size;i++){

if (signal.get(i)){

switch (i){

case 0:

System.out.println("Box is empty");

break;

case 1:

System.out.println("Box is half full");

break;

case 2:

System.out.println("Box is full");

break;

// more cases come here

}

}

}

}

}

Internet of Things

The Internet Of Things (IoT) is a buzzword that’s used to describe Internet applications that deal with small, sensor-like devices, which have a limited amount of memory and processing power. Consider using the class BitSet if you need to program sensors.

Choosing the Right Collection

Java has dozens of collection classes and interfaces. In this lesson I’ve shown just a few of them. But which collection is the right one for your needs? Below are some of the considerations that may help you to choose one.

· If you need to access data by index, consider using ArrayList.

· If you need to often insert or remove data in/from a collection, a LinkedList should be a good choice

· If you need a collection that doesn’t allow duplicate elements, use one of the collections that implements Set interface. For fast access use HashSet. For sorted set use TreeSet.

· For storing key/value pairs use a collection that implements the Map interface; e.g. HashMap or HashTable.

· If you need a collection for a fast search that remains fast regardless of the size of the data set use HashSet.

These recommendations are applicable for cases when there is no need to access data concurrently by multiple threads (see Lesson 17). Java has many concurrent collections located in the package java.util.concurrent, and these collections have to be used for concurrent data access. Oracle’sCollections tutorial and Java API on collection classes is a good resource for finding the right collection for your application.

The Big O Notation

In computing there is something called Big O notation, which describes how the time to do a certain thing grows when the size of the input data grows. The Big O notation is represented as O(n). The higher value of n represents greater dependency of the task from the data size. Hence O(1) means that the speed of a task doesn’t depend on the collection size. The article on Performance of Java Collections includes Big O notations for various Java collections and operations.

Try It

Modify the LinkedList example from Listing 11-5 to add an arbitrary object, say, the VIP customer after the very first element of the list. You must do this while iterating through the list. When the program is ready it should print the following:

Alex Smith

VIP Customer

Mary Lou

Sim Monk

Lesson Requirements

You should have Java installed.

NOTE You can download the code and resources for this “Try It” from the book’s web page at www.wrox.com/go/javaprog24hr2e. You can find them in the Lesson11.zip.

Step-by-Step

1. Create a new Eclipse project called Lesson11.

2. After the first call to iterator.next() add the following line: iterator.add(“VIP Customer”);

3. Run the program and observe that it doesn’t print “VIP Customer.” This happens because the iterator is already positioned after the newly inserted object.

4. Add the line iterator.previous() right after the “VIP Customer” to move the iterator one element back.

5. Add one more print statement (otherwise the program won’t reach Sim Monk). Compile and run the program. It prints all four elements as requested.

6. Now break the code by changing the line that you added in Step 2 to passengerList.add(“VIP Customer”);.

7. Run the program. It prints the first element of the linked list and then produces a runtime exception:

8. Alex Smith

9. Exception in thread "main"

10. java.util.ConcurrentModificationException

11. at java.util.LinkedList$ListItr.checkForComodification(

12. LinkedList.java:761)

13. at java.util.LinkedList$ListItr.next(LinkedList.java:696)

at TestLinkedList.main(TestLinkedList.java:20)

The reason for this concurrent modification exception is that one thread of execution was iterating through a collection, and at the same time another thread was trying to modify the underlying collection in an unsafe way. The concept of threads is introduced in Chapter 17.

TIP Please select the videos for Lesson 11 online at www.wrox.com/go/javaprog24hr2e. You will also be able to download the code and resources for this lesson from the website.