Core Java for the Impatient (2013)
Chapter 7. Collections
Topics in This Chapter
7.1 An Overview of the Collections Framework
7.2 Iterators
7.3 Sets
7.4 Maps
7.5 Other Collections
7.6 Views
Exercises
Many data structures have been developed so programmers can store and retrieve values efficiently. The Java API provides implementations of common data structures and algorithms, as well as a framework to organize them. In this chapter, you will learn how to work with lists, sets, maps, and other collections.
The key points of this chapter are:
1. The Collection interface provides common methods for all collections, except for maps which are described by the Map interface.
2. A list is a sequential collection in which each element has an integer index.
3. A set is optimized for efficient containment testing. Java provides HashSet and TreeSet implementations.
4. For maps, you have the choice between HashMap and TreeMap implementations. A LinkedHashMap retains insertion order.
5. The Collection interface and Collections class provide many useful algorithms: set operations, searching, sorting, shuffling, and more.
6. Views provide access to data stored elsewhere using the standard collection interfaces.
7.1 An Overview of the Collections Framework
The Java collections framework provides implementations of common data structures. To make it easy to write code that is independent of the choice of data structures, the collections framework provides a number of common interfaces, shown in Figure 7–1. The fundamental interface is Collection whose methods are shown in Table 7–1.
Figure 7–1 Interfaces in the Java collections framework
Table 7–1 The Methods of the Collection<E> Interface
A List is a sequential collection: Elements have position 0, 1, 2, and so on. Table 7–2 shows the methods of that interface.
Table 7–2 The List Interface
The List interface is implemented both by the ArrayList class, which you have seen throughout this book, and the LinkedList class. If you took a course on data structures, you probably remember a linked list—a sequence of linked nodes, each carrying an element. Insertion in the middle of a linked list is speedy—you just splice in a node. But to get to the middle, you have to follow all the links from the beginning, which is slow. There are applications for linked lists, but most application programmers will probably stick with array lists when they need a sequential collection. Still, the List interface is useful. For example, the method Collections.nCopies(n, o) returns a List object with n copies of the object o. That object “cheats” in that it doesn’t actually store n copies but, when you ask about any one of them, returns o.
Caution
The List interface provides methods to access the nth element of a list, even though such an access may not be efficient. To indicate that it is, a collection class should implement the RandomAccess interface. This is a tagging interface without methods. For example, ArrayList implements List and RandomAccess, but LinkedList implements only the List interface.
In a Set, elements are not inserted at a particular position, and duplicate elements are not allowed. A SortedSet allows iteration in sort order, and a NavigableSet has methods for finding neighbors of elements. You will learn more about sets in Section 7.3, “Sets,” on p. 233.
A Queue retains insertion order, but you can only insert elements at the tail and remove them from the head (just like a queue of people). A Deque is a double-ended queue with insertion and removal at both ends.
All collection interfaces are generic, with a type parameter for the element type (Collection<E>, List<E>, and so on). The Map<K, V> interface has a type parameter K for the key type and V for the value type.
You are encouraged to use the interfaces as much as possible in your code. For example, after constructing an ArrayList, store the reference in a variable of type List:
Click here to view code image
List<String> words = new ArrayList<>();
Whenever you implement a method that processes a collection, use the least restrictive interface as parameter type. Usually, a Collection, List, or Map will suffice.
One advantage of a collections framework is that you don’t have to reinvent the wheel when it comes to common algorithms. Some basic algorithms (such as addAll and removeIf) are methods of the Collection interface. The Collections utility class contains many additional algorithms that operate on various kinds of collections. You can sort, shuffle, rotate, and reverse lists, find the maximum or minimum, or the position of an arbitrary element in a collection, and generate collections with no elements, one element, or n copies of the same element. Table 7–3 provides a summary.
Table 7–3 Useful Methods of the Collections Class
7.2 Iterators
Each collection provides a way to iterate through its elements in some order. The Iterable<T> superinterface of Collection defines a method
Iterator<T> iterator()
It yields an iterator that you can use to visit all elements.
Click here to view code image
Collection<String> coll = ...;
Iterator<String> iter = coll.iterator();
while (iter.hasNext()) {
String element = iter.next();
Process element
}
In this case, you can simply use the enhanced for loop:
for (String element : coll) {
Process element
}
Note
For any object c of a class that implements the Iterable<E> interface, the enhanced for loop is translated to the preceding form.
The Iterator interface also has a remove method which removes the previously visited element. This loop removes all elements that fulfill a condition:
Click here to view code image
while (iter.hasNext()) {
String element = iter.next();
if (element fulfills the condition)
iter.remove();
}
However, it is easier to use the removeIf method:
Click here to view code image
coll.removeIf(e -> e fulfills the condition);
Caution
The remove method removes the last element that the iterator has returned, not the element to which the iterator points. You can’t call remove twice without an intervening call to next or previous.
The ListIterator interface is a subinterface of Iterator with methods for adding an element before the iterator, setting the visited element to a different value, and for navigating backwards. It is mainly useful for working with linked lists.
Click here to view code image
List<String> friends = new LinkedList<>();
ListIterator<String> iter = friends.listIterator();
iter.add("Fred"); // Fred |
iter.add("Wilma"); // Fred Wilma |
iter.previous(); // Fred | Wilma
iter.set("Barney"); // Fred | Barney
Caution
If you have multiple iterators visiting a data structure and one of them mutates it, the other ones can become invalid. An invalid iterator may throw a ConcurrentModificationException if you continue using it.
7.3 Sets
A set can efficiently test whether a value is an element, but it gives up something in return: It doesn’t remember in which order elements were added. Sets are useful whenever the order doesn’t matter. For example, if you want to disallow a set of bad words as usernames, their order doesn’t matter. You just want to know whether a proposed username is in the set or not.
The Set interface is implemented by the HashSet and TreeSet classes. Internally, these classes use very different implementations. If you have taken a course in data structures, you may know how to implement hash tables and binary trees—but you can use these classes without knowing their internals.
Generally, hash sets are a bit more efficient, provided you have a good hash function for your elements. Library classes such as String or Path have good hash functions. You learned how to write hash function for your own classes in Chapter 4.
For example, that set of bad words can be implemented simply as
Click here to view code image
Set<String> badWords = new HashSet<>();
badWords.add("sex");
badWords.add("drugs");
badWords.add("c++");
if (badWords.contains(username.toLowerCase()))
System.out.println("Please choose a different user name");
You use a TreeSet if you want to traverse the set in sorted order. One reason you might want to do this is to present users a sorted list of choices.
The element type of the set must implement the Comparable interface, or you need to supply a Comparator in the constructor.
Click here to view code image
TreeSet<String> countries = new TreeSet<>(); // Visits added countries in sorted order
countries = new TreeSet<>((u, v) ->
u.equals(v) ? 0
: u.equals("USA") ? -1
: v.equals("USA") ? 1
: u.compareTo(v));
// USA always comes first
The TreeSet class implements the SortedSet and NavigableSet interfaces, whose methods are shown in Tables 7–4 and 7–5.
Table 7–4 SortedSet<E> Methods
Table 7–5 NavigableSet<E> Methods
7.4 Maps
Maps store associations between keys and values. Call put to add a new association, or change the value of an existing key:
Click here to view code image
Map<String, Integer> counts = new HashMap<>();
counts.put("Alice", 1); // Adds the key/value pair to the map
counts.put("Alice", 2); // Updates the value for the key
This example uses a hash map which, as for sets, is usually the better choice if you don’t need to visit the keys in sorted order. If you do, use a TreeMap instead.
Here is how you can get the value associated with a key:
Click here to view code image
int count = counts.get("Alice");
If the key isn’t present, the get method returns null. In this example, that would cause a NullPointerException when the value is unboxed. A better alternative is
Click here to view code image
int count = counts.getOrDefault("Alice", 0);
Then a count of 0 is returned if the key isn’t present.
When you update a counter in a map, you first need to check whether the counter is present, and if so, add 1 to the existing value. The merge method simplifies that common operation. The call
Click here to view code image
counts.merge(word, 1, Integer::sum);
associates word with 1 if the key wasn’t previously present, and otherwise combines the previous value and 1, using the Integer::sum function.
Table 7–6 summarizes the map operations.
You can get views of the keys, values, and entries of a map by calling these methods:
Click here to view code image
Set<K> keySet()
Set<Map.Entry<K, V>> entrySet()
Collection<K> values()
The collections that are returned are not copies of the map data, but they are connected to the map. If you remove a key or entry from the view, then the entry is also removed from the underlying map.
To iterate through all keys and values of a map, you can iterate over the set returned by the entrySet method:
Click here to view code image
for (Map.Entry<String, Integer> entry : counts.entrySet()) {
String k = entry.getKey();
Integer v = entry.getValue();
Process k, v
}
Or simply use the forEach method:
counts.forEach((k, v) -> {
Process k, v
});
Caution
Some map implementations (for example, ConcurrentHashMap) disallow null for keys or values. And with those that allow it (such as HashMap), you need to be very careful if you do use null values. A number of map methods interpret a null value as an indication that an entry is absent, or should be removed.
Tip
Sometimes, you need to present map keys in an order that is different from the sort order. For example, in the JavaServer Faces framework, you specify labels and values of a selection box with a map. Users would be surprised if the choices were sorted alphabetically (Friday, Monday, Saturday, Sunday, Thursday, Tuesday, Wednesday) or in the hash code order. In that case, use a LinkedHashMap that remembers the order in which entries were added and iterates through them in that order.
7.5 Other Collections
In the following sections, I briefly discuss some collection classes that you may find useful in practice.
7.5.1 Properties
The Properties class implements a map that can be easily saved and loaded using a plain text format. Such maps are commonly used for storing configuration options for programs. For example:
Click here to view code image
Properties settings = new Properties();
settings.put("width", "200");
settings.put("title", "Hello, World!");
try (OutputStream out = Files.newOutputStream(path)) {
settings.store(out, "Program Properties");
}
The result is the following file:
#Program Properties
#Mon Nov 03 20:52:33 CET 2014
width=200
title=Hello, World\!
Caution
Property files are encoded in ASCII, not UTF-8. Comments start with # or !. Characters less than '\u0021' or greater than '\u007e' are written as Unicode escapes \unnnn. A newline in a key or value is written as \n. The characters \ # ! are escaped as \\ \# \!.
To load properties from a file, call
Click here to view code image
try (InputStream in = Files.newInputStream(path)) {
settings.load(in);
}
Then use the getProperty method to get a value for a key. You can specify a default value used when the key isn’t present:
Click here to view code image
String title = settings.getProperty("title", "New Document");
Note
For historical reasons, the Properties class implements Map<Object, Object> even though the values are always strings. Therefore, don’t use the get method—it returns the value as an Object.
The System.getProperties method yields a Properties object with system properties. Table 7–7 describes the most useful ones.
Table 7–7 Useful System Properties
7.5.2 Bit Sets
The BitSet class stores a sequence of bits. A bit set packs bits into an array of long values, so it is more efficient to use a bit set than an array of boolean values. Bit sets are useful for sequences of flag bits or to represent sets of non-negative integers, where the ith bit is 1 to indicate that i is contained in the set.
The BitSet class gives you convenient methods for getting and setting individual bits. This is much simpler than the bit-fiddling necessary to store bits in int or long variables. There are also methods that operate on all bits together for set operations, such as union and intersection. See Table 7–8 for a complete list. Note that the BitSet class is not a collection class—it does not implement Collection<Integer>.
Table 7–8 Methods of the BitSet Class
7.5.3 Enumeration Sets and Maps
If you collect sets of enumerated values, use the EnumSet class instead of BitSet. The EnumSet class has no public constructors. Use a static factory method to construct the set:
Click here to view code image
enum Weekday { MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY, SUNDAY };
Set<Weekday> always = EnumSet.allOf(Weekday.class);
Set<Weekday> never = EnumSet.noneOf(Weekday.class);
Set<Weekday> workday = EnumSet.range(Weekday.MONDAY, Weekday.FRIDAY);
Set<Weekday> mwf = EnumSet.of(Weekday.MONDAY, Weekday.WEDNESDAY, Weekday.FRIDAY);
You can use the methods of the Set interface to work with an EnumSet.
An EnumMap is a map with keys that belong to an enumerated type. It is implemented as an array of values. You specify the key type in the constructor:
Click here to view code image
EnumMap<Weekday, String> personInCharge = new EnumMap<>(Weekday.class);
personInCharge.put(Weekday.MONDAY, "Fred");
7.5.4 Stacks, Queues, Deques, and Priority Queues
A stack is a data structure for adding and removing elements at one end (the “top” of the stack). A queue lets you efficiently add elements at one end (the “tail”) and remove them from the other end (the “head”). A double-ended queue, or deque, supports insertion and removal at both ends. With all these data structures, adding elements in the middle is not supported.
The Queue and Deque interfaces define the methods for these data structures. There is no Stack interface in the Java collections framework, just a legacy Stack class from the earliest days of Java that you should avoid. If you need a stack, queue, or deque and are not concerned about thread safety, use an ArrayDeque.
With a stack, use the push and pop methods.
Click here to view code image
ArrayDeque<String> stack = new ArrayDeque<>();
stack.push("Peter");
stack.push("Paul");
stack.push("Mary");
while (!stack.isEmpty())
System.out.println(stack.pop());
With a queue, use add and remove.
Click here to view code image
Queue<String> queue = new ArrayDeque<>();
queue.add("Peter");
queue.add("Paul");
queue.add("Mary");
while (!queue.isEmpty())
System.out.println(queue.remove());
Thread-safe queues are commonly used in concurrent programs. You will find more information about them in Chapter 10.
A priority queue retrieves elements in sorted order after they were inserted in arbitrary order. That is, whenever you call the remove method, you get the smallest element currently in the priority queue.
A typical use for a priority queue is job scheduling. Each job has a priority. Jobs are added in random order. Whenever a new job can be started, the highest priority job is removed from the queue. (Since it is traditional for priority 1 to be the “highest” priority, the remove operation yields the minimum element.)
Click here to view code image
public class Job implements Comparable<Job> { ... }
...
PriorityQueue<Job> jobs = new PriorityQueue<>();
jobs.add(new Job(4, "Collect garbage"));
jobs.add(new Job(9, "Match braces"));
jobs.add(new Job(1, "Fix memory leak"));
...
while (jobs.size() > 0) {
Job job = jobs.remove(); // The most urgent jobs are removed first
execute(job);
}
Just like a TreeSet, a priority queue can hold elements of a class that implements the Comparable interface, or you can supply a Comparator in the constructor. However, unlike a TreeSet, iterating over the elements does not necessarily yield them in sorted order. The priority queue uses algorithms for adding and removing elements that cause the smallest element to gravitate to the root, without wasting time on sorting all elements.
7.5.5 Weak Hash Maps
The WeakHashMap class was designed to solve an interesting problem. What happens with a value whose key is no longer used anywhere in your program? If the last reference to a key has gone away, there is no longer any way to refer to the value object so it should be removed by the garbage collector.
It isn’t quite so simple. The garbage collector traces live objects. As long as the map object is live, all entries in it are live and won’t be reclaimed—and neither will be the values that are referenced by the entries.
This is the problem that the WeakHashMap class solves. This data structure cooperates with the garbage collector to remove key/value pairs when the only reference to the key is the one from the hash table entry.
Technically, the WeakHashMap uses weak references to hold keys. A WeakReference object holds a reference to another object—in our case, a hash table key. Objects of this type are treated in a special way by the garbage collector. If an object is reachable only by a weak reference, the garbage collector reclaims the object and places the weak reference into a queue associated with the WeakReference object. Whenever a method is invoked on it, a WeakHashMap checks its queue of weak references for new arrivals and removes the associated entries.
7.6 Views
A collection view is a lightweight object that implements a collection interface, but doesn’t store elements. For example, the keySet and values methods of a map yield views into the map.
Another example is the Arrays.asList method. If a is an array of type E[], then Arrays.asList(a) returns a List<T> that is backed by the array elements.
Usually, a view does not support all operations of its interface. For example, it makes no sense to call add on a key set of a map or the list returned by Arrays.asList.
In the following sections, you will see some views that are provided by the Java collections framework.
7.6.1 Ranges
You can form a sublist view of a list. For example,
Click here to view code image
List<String> sentence = ...;
List<String> nextFive = sentence.subList(5, 10);
This view accesses the elements with index 5 through 9. Any mutations of the sublist (such as setting, adding, or removing elements) affect the original.
For sorted sets and maps, you specify a range by the lower and upper bound:
Click here to view code image
TreeSet<String> words = ...;
SortedSet<String> asOnly = words.subSet("a", "b");
As with subList, the first bound is inclusive, and the second exclusive.
The headSet and tailSet methods yield a subrange with no lower or upper bound.
Click here to view code image
NavigableSet<String> nAndBeyond = words.tailSet("n");
With the NavigableSet interface, you can choose for each bound whether it should be inclusive or exclusive—see Table 7–5.
For a sorted map, there are equivalent methods subMap, headMap, and tailMap.
7.6.2 Empty and Singleton Views
The Collections class has static methods yielding an immutable empty list, set, sorted set, navigable set, map, sorted map, navigable map, iterator, list iterator, or enumeration (an iterator-like legacy interface from Java 1.0).
Similarly, there are static methods yielding a set or list with a single element, and a map with a single key/value pair. These methods are shown in Table 7–3.
For example, if a method requires a map of attributes, and you have none or only one to supply, you can call
Click here to view code image
doWork(Collections.emptyMap());
or
Click here to view code image
doWork(Collections.singletonMap("id", id));
instead of creating a heavyweight HashMap or TreeMap object.
7.6.3 Unmodifiable Views
Sometimes, you want to share the contents of a collection but you don’t want it to be modified. Of course, you could copy the values into a new collection, but that is potentially expensive. An unmodifiable view is a better choice. Here is a typical situation. A Person object maintains a list of friends. If the getFriends gave out a reference to that list, a caller could mutate it. But it is safe to provide an unmodifiable list view:
Click here to view code image
public class Person {
private ArrayList<Person> friends;
public List<Person> getFriends() {
return Collections.unmodifiableList(friends);
}
...
}
All mutator methods throw an exception when they are invoked on an unmodifiable view.
As you can see from Table 7–3, you can get unmodifiable views as collections, lists, sets, sorted sets, navigable sets, maps, sorted maps, and navigable maps.
Note
In Chapter 6, you saw how it is possible to smuggle the wrong kind of elements into a generic collection (a phenomenon called “heap pollution”), and that a runtime error is reported when the inappropriate element is retrieved, not when it is inserted. If you need to debug such a problem, use a checked view. Where you constructed, say, an ArrayList<String>, instead use
Click here to view code image
List<String> strings
= Collections.checkedList(new ArrayList<>(), String.class);
The view monitors all insertions into the list and throws an exception when an object of the wrong type is added.
Note
The Collections class produces synchronized views that ensure safe concurrent access to data structures. In practice, these views are not as useful as the data structures in the java.util.concurrent package that were explicitly designed for concurrent access. I suggest you use those classes and stay away from synchronized views.
Exercises
1. Implement the “Sieve of Erathostenes” algorithm to determine all prime numbers ≤ n. Add all numbers from 2 to n to a set. Then repeatedly find the smallest element s in the set, and remove s2, s · (s + 1), s · (s + 2), and so on. You are done when s2 > n. Do this with both a HashSet<Integer> and a BitSet.
2. In an array list of strings, make each string uppercase. Do this with (a) an iterator, (b) a loop over the index values, and (c) the replaceAll method.
3. How do you compute the union, intersection, and difference of two sets, using just the methods of the Set interface and without using loops?
4. Produce a situation that yields a ConcurrentModificationException. What can you do to avoid it?
5. Implement a method public static void swap(List<?> list, int i, int j) that swaps elements in the usual way when the type of list implements the RandomAccess interface, and that minimizes the cost of visiting the positions at index i and j if it is not.
6. I encouraged you to use interfaces instead of concrete data structures, for example, a Map instead of a TreeMap. Unfortunately, that advice goes only so far. Why can’t you use a Map<String, Set<Integer>> to represent a table of contents? (Hint: How would you initialize it?) What type can you use instead?
7. Write a program that reads all words in a file and prints out how often each word occurred. Use a TreeMap<String, Integer>.
8. Write a program that reads all words in a file and prints out on which line(s) each of them occurred. Use a map from strings to sets.
9. You can update the counter in a map of counters as
Click here to view code image
counts.merge(word, 1, Integer::sum);
Do the same without the merge method, (a) by using contains, (b) by using get and a null check, (c) by using getOrDefault, (d) by using putIfAbsent.
10. Implement Dijkstra’s algorithm to find the shortest paths in a network of cities, some of which are connected by roads. (For a description, check out your favorite book on algorithms or the Wikipedia article.) Use a helper class Neighbor that stores the name of a neighboring city and the distance. Represent the graph as a map from cities to sets of neighbors. Use a PriorityQueue<Neighbor> in the algorithm.
11. Write a program that reads a sentence into an array list. Then, using Collections.shuffle, shuffle all but the first and last word, without copying the words into another collection.
12. Using Collections.shuffle, write a program that reads a sentence, shuffles the words, and prints the result. Fix the capitalization of the initial word and the punctuation of the last word (before and after the shuffle). Hint: Don’t shuffle the words.
13. The LinkedHashMap calls the method removeEldestEntry whenever a new element is inserted. Implement a subclass Cache that limits the map to a given size provided in the constructor.
14. Write a method that produces an immutable list view of the numbers from 0 to n, without actually storing the numbers.
15. Generalize the preceding exercise to an arbitrary IntFunction. Note that the result is an infinite collection, so certain methods (such as size and toArray) should throw an UnsupportedOperationException.
16. Improve the implementation of the preceding exercise by caching the last 100 computed function values.
17. Demonstrate how a checked view can give an accurate error report for a cause of heap pollution.
18. The Collections class has static variables EMPTY_LIST, EMPTY_MAP, and EMPTY_SET. Why are they not as useful as the emptyList, emptyMap, and emptySet methods?