Collections

Goals

Concepts

Library

Lesson

At this point you're very familiar with a few basic data structures; including arrays; linked lists; and trees. You realize that on a higher level, an abstract data type (ADT) is the type of interface that determines how we use data; we can choose among data structures to use as the implementation for a particular ADT. For example, one ADT is list of items; one could store a list of items in an array or a linked list (or even a tree structure, although that isn't common). The list interface allows us to add and remove things at list indexes; the underlying data structure determines how best to store that data in memory. You've even written an ListADT type and put a LinkedListADT implementation behind it.

Java Collections Framework interfaces class diagram.
Java Collections Framework interfaces.

Java has an entire set of ADTs and implementations, referred to as the Java Collections Framework (JCF), as part of the standard library. These collections are used day-in and day-out with Java, and third-party libraries such as Google Guava have added extensions and utilities for working with Java collections.

The core Java collections library consists of a set of interfaces, mostly organized as either a Collection or a Map. These interfaces represent Java's conception of abstract data types, and the collections library includes many data structures that implement these ADT interfaces. In this lesson and subsequent lessons, you will learn to use the core Java collection classes. You will discontinue use of your custom ADTs and implementations, and switch to using Java collections.

Collections

In the Java Collections Framework, a collection refers specifically to those ADTs representing groups of individual elements. Technically therefore not all of Java's ADTs are collections per-se. Other ADTs that store item relationships will be discussed in a future lesson.

Collection interfaces class diagram.
Collection interfaces class diagram.
List
A list is a sequence of elements that can be accessed via indexes.
Set
A set is a collection of elements with no duplicates and no inherent order.
Queue
A queue is a first-in, first-out (FIFO) collection of elements, similar to a line of people waiting in a bank.
Deque
A deque (pronounced like deck) is a double-ended queue, allowing insertion and deletion from either end. In addition to functioning as a queue, a deque can also function as a last-in, first-out (LIFO) data type called a stack.

Lists Versus Sets

Main differences between List and Set.
List Set
Duplicates yes no
Ordered yes no Some set implementations can be sorted.

The two most-often used Java collection ADTs are lists and sets. Each has different characteristics, so it is important to choose the appropriate one for your purpose. The primary distinction between lists and sets is that lists allow duplicates and provide positional access, while sets prevent duplicates and may not guarantee order.

Collection<E>

All collections ultimately implement the base java.util.Collection<E> interface, which brings a wealth of functionality shared among all collection types. All Java collections automatically implement java.lang.Iterable<T>, for example. This means that you can get an iterator to the elements in any collection, or use it in an enhanced for(…) loop! Here are some of the most important methods Collection<E>:

Iterable<T>
All Java collections automatically implement java.lang.Iterable<T>. This means that you can get an iterator to the elements in any collection, or use it in an enhanced for(…) loop!
size()
You can easily ask any collection for the number of elements in it.
isEmpty()
This is a convenience method logically equivalent to checking size() == 0. Calling isEmpty() is preferred over checking the size, because the underlying implementation may have a more efficient way than retrieving the number of elements.
contains(Object object)
You can ask a collection if a particular object is contained in the collection. Normally this uses Object.equals(…) for comparison, so most of the time you aren't checking to see if an actual instance is literally in the collection. Some Collection<E> implementations can check for an object more quickly and efficiently than others, as you will learn below.
add(E element)
All collections provide a common method for adding items to the collection.
addAll(Collection<? extends E> collection)
Convenience method for adding elements in bulk from another collection. This may be efficiently than adding the elements individually.
remove(Object object)
All collections provide a common method for removing items from the collection.
clear()
Removes all items from a collection, leaving the collection empty. Depending on the specific ADT and implementation, this method may be more efficient than removing items individually.

List<E>

The java.util.List<E> interface is perhaps the most-used of the Java collection types, and is equivalent to the ListADT interface you created. The List.add(…) method (inherited from Collection<E>) will add the item to the end of the list. The List<E> interface also adds several methods that bring functionality specific to lists:

get(int index)
Retrieves an item at a specific index in the list.
set(int index, E element)
Changes the item at a specific index in the list.
add(int index, E element)
Inserts an item into a specific index in the list. This operation inherently increases the indexes of all following items in the list, if any.
remove(int index)
Removes an item from a specific index in the list. This operation inherently decreases the indexes of all following items in the list, if any.
listIterator()
Returns a java.util.ListIterator<E>, an iterator with special capabilities for traversing lists (such as the ability to iterate both forwards and backwards).
subList(int fromIndex, int toIndex)
Returns a List<E> implementation that is a view of the original list for some particular range. The new list is a live view in that changes made to it are reflected in the original list. Note that for the expressed range the fromIndex parameter is inclusive, indicating the first index to include; but the toIndex parameter is exclusive, indicating one index past the last element to include.
sort(Comparator<? super E> comparator)
Sorts the list, using the provided strategy for comparing individual items in the list. This is the collection equivalent of the java.util.Arrays.sort(T[], Comparator<? super T>) utility method for working with arrays you learned about in the lesson on the strategy pattern.
Implementations
java.util.ArrayList<E>
A List<E> backed by an array. This is the analogue of ArrayListADT<E> in these lessons.
java.util.LinkedList<E>
A List<E> implemented using doubly-linked nodes. This is the analogue of LinkedListADT<E> in these lessons.
java.util.Vector<E>
A legacy data structure from Java 1.0, and later retrofitted to implement List<E>. This class is thread-safe, but its concurrency implementation is so overbearing that overwhelming that it is inefficient compared with other approaches.

Set<E>

A java.util.Set<E> is a collection of elements; in this way it is similar to a list. But that's where the similarities end. A Set<E> does not allow duplicates; if you try to add an object that equals(…) another item in the set, the new item will be ignored. Furthermore a Set<E> has no inherent order; when you iterator over the items in a Set<E>, you have no guarantees which items will be returned first. A Set<E> provides no additional methods over those included in Collection<E>, but its implementation will likely perform many of them more efficiently than other collection types.

Subinterfaces
java.util.SortedSet<E>
If a Set<E> implements this interface, it guarantees some ordering of its elements. This is usually the natural ordering or the ordering of some provided Comparator<T> strategy.
java.util.NavigableSet<E>
The NavigableSet<E> interface provides additional methods for navigating to items in order. For example higher(E element) returns the first element in the set that is higher than the given element. The headSet(E toElement) method returns a view to the set that only includes elements lower than the given element. The fact that a NavigableSet<E> has some order might have tipped you off that a NavigableSet<E> is a SortedSet<E>.
Implementations
java.util.HashSet<E>
This Set<E> implementation efficiently keeps track of objects and detects duplicates by using hash codes, similar to how you implemented your HashTableImpl<K, V>. Keys must implement Object.hashCode() correctly. HashSet<E> should be the default Set<E> implementation you choose unless you have reason to choose another implementation.
java.util.LinkedHashSet<E>
Equivalent to a HashSet<E>, except that it also maintains a doubly-linked list to maintain iteration order. Iteration occurs in insertion-order, the order in which you add items to the set.
java.util.TreeSet<E>
A Set<E> that automatically sorts its elements by storing items in a tree data structure. The elements are either sorted by their natural ordering, or by a java.util.Comparator<T> provided in the constructor. A TreeSet<E> is a NavigableSet<E>, which implies it is also a SortedSet<E>.

Queue<E>

A java.util.Queue<E> represents a queue such a line at a bank, allowing first-in, first-out (FIFO) processing. A queue is an ordered collection of elements, like a list—and in fact a linked list is one of the most common data structure implementations for a queue. The main difference between a the Queue<E> and List<E> interfaces is that Queue<E> provides special methods for adding adding items to the tail of the queue (e.g. the end of a linked list) and removing items from the head of the queue (e.g. the beginning of a linked list). As such it is usually used for holding elements for later processing.

The Queue<E> interface provides pairs of method for performing various functions: one method that attempts to perform the operation and throws an exception if not possible; and another that only attempts the operation and returns a special value if the operation is not possible. The forms that return special values on error are mostly for use with capacity-restricted queues.

Operation Description Exception on Error Special Value on Error
Add Adds element to tail of queue. add(E element) offer(E element)
Remove Removes element from head of queue. remove() poll()
Examine Returns element from head of queue. element() peek()
Principal methods of the java.util.Queue<E> interface.

Deque<E>

The java.util.Deque<E> interface extends Queue<E> and represents a double-ended queue. Most importantly a deque allows insertion and removal at both the head and tail ends. The Deque<E> interface also provides pairs of method for performing various functions: one method that attempts to perform the operation and throws an exception if not possible; and another that only attempts the operation and returns a special value if the operation is not possible. The forms that return special values on error are mostly for use with capacity-restricted deques.

Operation Description Head Tail
Exception on Error Special Value on Error Exception on Error Special Value on Error
Add Adds element to tail of queue. addFirst(E element) offerFirst(E element) addLast(E element) offerLast(E element)
Remove Removes element from head of queue. removeFirst() pollFirst() removeLast() pollLast()
Examine Returns element from head of queue. getFirst() peekFirst() getLast() peekLast()
Principal methods of the java.util.Deque<E> interface.
Stack

Besides functioning as a first-in, first-out (FIFO) queue, a deque can be used as a stack of items, allowing last-in, first-out (LIFO) processing. The Deque<E> interface even provides special stack-related methods, although they duplicate the more general deque methods.

Operation Description Stack Method Equivalent Deque Method
Push Adds element to top of stack. push(E element) addFirst(E element)
Pop Removes element from top of stack. pop() removeFirst()
Examine Returns element from top of stack. peek() peekFirst()
Stack-related methods of the java.util.Deque<E> interface.

Utilities

In addition to the collection interfaces and their implementations, Java comes with a Collections class that comprises a set of utilities for working with collections—any implementation of the collection interfaces, not just those that ship with Java. It would be a good idea to take a look at the java.util.Collections documentation to get in idea of the full set of utilities. The most useful ones are explained below.

Empty Collections

If you know that your method will return an empty collection, instead of creating a collection implementation such as ArrayList<E> you can use one of Java's pre-made, immutable, empty collections. The Collections class provides methods such as emptyList() and emptySet() that even return the correct generic type of collection you require.

Example method using Collections.emptySet().
/**
 * Returns a set of all characters in the string, with no duplicates.
 * @param string The string of characters.
 * @return The set of characters encountered in the string.
 */
public static Set<Character> getCharacters(@Nonnull final String string) {
  if(string.isEmpty()) {
    return java.util.Collections.emptySet();  //provides the correct generic type
  }
  //TODO process string
}

Immutable Collections

Sometimes you need to have a collection that you do not allow anyone to modify. Rather implementing this functionality into each collection type, Java provides a set of immutability utilities such as unmodifiableCollection(…), unmodifiableList(…), and unmodifiableSet(…) which turn any collection instance into an immutable one. The original collection is still mutable; Java instead returns a new collection instance that is provides immutable access to your existing collection. Calling any modifying method will result in an UnsupportedOperationException.

Creating an immutable set using Collections.unmodifiableSet(…).
public static final Set<String> NICKNAMES;

static {
  final Set<String> nicknames = new HashSet<String>();
  nickNames.add("Will");
  nickNames.add("Willy");
  nickNames.add("Bill");
  NICKNAMES = java.util.Collections.unmodifiableSet(nicknames);
}

Hamcrest

The Hamcrest library you have been using with JUnit for unit testing comes with several matchers that assist in testing collections. They make it much easier to verify the result of tests that return things such as lists and sets. These matchers are all retrieved via static methods of org.hamcrest.Matchers; here are just a few of the most useful of them:

contains(E... items)
Checks whether an Iterable<T> contains exactly the given items, in the order they are given. All items must be present. Useful for verifying the contents of a list of items.
containsInAnyOrder(T... items)
Checks whether an Iterable<T> contains exactly the given items, in any order. All items must be present. Useful for verifying the contents of a set of items.
empty()
Checks whether a Collection<E> is empty.
hasItem(T item)
Checks whether an Iterable<T> contains at least the given item.
hasItems(T... items)
Checks whether an Iterable<T> contains at least the given items, in any order. All items must be present.
hasSize(int size)
Checks whether a Collection<E> is of the given size.

Review

Gotchas

In the Real World

Think About It

Self Evaluation

Task

Convert your booker project to use equivalent Java collections rather than your linked list implementations from your datastruct project.

Put your main list of publications into a static immutable List<Publication>. Your main list of publications should have already been converted from using an array to use a list.

See Also

References

Resources

Acknowledgments