Graphs

Goals

Concepts

Library

Lesson

When you learned about algorithms, you discovered that certain of them require you to first organize data in a certain way. A binary search, for example, requires that all input data first be sorted. Not only the order but sometimes even the general organization of data in the computer memory can be important or even necessary for an algorithm to work.

Data Structures

Until this point, whenever you need groups of objects, you have been storing them in arrays. In computer science many other approaches have been invented for holding groups of objects in addition to arrays. These approaches are called data structures, and they serve as general-purpose containers for data. Many times an algorithm will require that data be stored in a particular data structure in order for the algorithm to work. Data structures generally the same way across all programming languages, and they each have their own benefits and trade-offs.

Array

The Java “array objects” you have been using are objects like any non-primitive value. You may have given no thought about how actual values are stored inside the array object, but conceptually the values are stored in a row, in contiguous memory positions. In data structure terms, this is the actual “array”. You could say that the Java array object stores its values in an array data structure.

Index 0 1 2 3 n-1
Value element_1 element_2 element_3 element_4 element_n

An array data structure is rather rigid. It has a fixed length, and to add new elements we would somehow have to allocate more memory and/or move the array somewhere else in memory. Inserting an element into the array requires shifting all the other elements down to make room. But perhaps one of the biggest drawbacks is that finding an object in an array means that we have to look through all the elements of the array, one at a time. We could get lucky and find an object toward the front of the array. If we were not so lucky we might not find the object until we reached the end of the array.

Graph

A graph data structure.
A graph data structure.

A solution to many of the drawbacks of arrays is to link the objects together in memory, much like a TinkerToy® set. Linking objects together creates a data structure called a graph.

In graph terminology, each object above is called a vertex of the graph, and the lines between the objects are called edges. Graphs can come in all shapes an sizes. A graph could consist of several vertices strung out in a single line. A graph could consist of several vertices each connected to all the other vertices. A graph could even consist of a single vertex not connected to anything. The key concept of a graph is that a vertex has a way to link to one or more other vertices.

Node

Node class diagram.
Node class diagram.

But how can we link our objects in real life? If we have three Animal instances, for example, how could we link them together? One solution would be to add a nextAnimal variable to the Animal class. If we wanted to link together several animals, we could use animal1.setNextAnimal(animal2), and use animal2.setNextAnimal(animal3), and so on—sort of like a row of dogs on leashes, each leashed to the other. But do we really want to pollute our Animal class with special linking variables just because we want a way to group our animals together? This approach would also come with some restrictions: it would be difficult for an Animal to appear in more than one group, for example.

A more flexible approach is to create a separate “wrapper class” for each of the vertexes. Normally this data structure called a node, and its sole purpose is to hold the object and point to one or more other nodes.

Here is conceptually how a Node class might be implemented. In reality the number of nodes would depend on the specific type of graph.

Example Node implementation.
public class Node {

  private final Object value;

  /** @return The value represented by this node. */
  public @Nonnull Object getValue() {
    return value;
  }

  private Node[] links = new Node[0];

  /**
   * Creates a node containing an object.
   * @param value The object to store in the node.
   */
  public Node(@Nonnull final Object value) {
    this.value = checkNotNull(value);
  }

  /**
   * Sets the links to other nodes.
   * @param links The other nodes this node links to.
   */
  public void setLinks(@Nonnull final Node... links)
  {
    this.links = checkNotNull(links);
  }

}

We could then construct the graph pictured above:

Creating a graph data structure with Node instances.
final Object object1 = new Object();  //sample objects; could be any objects
final Object object2 = new Object();
final Object object3 = new Object();
final Object object4 = new Object();
final Object object5 = new Object();
final Node node1 = new Node(object1);  //create the nodes
final Node node2 = new Node(object2);
final Node node3 = new Node(object3);
final Node node4 = new Node(object4);
final Node node5 = new Node(object5);
node1.setLinks(node3, node2, node5);  //link the nodes
node2.setLinks(node3);
node3.setLinks(node4, node5);
node5.setLinks(node3);

Graphs in general can become quite complex and can help solve complicated problems. But for merely storing objects there are a few main graph types that are relatively simple.

Linked List

A linked list data structure is a graph consisting of a single line of nodes strung together.

A linked list data structure.
A linked list data structure.

Because each node can only connect to one node in front of it (or none at all, if it is the last node), a LinkedListNode for a linked list can be simpler than the more general Node source code shown in the earlier section above.

Example LinkedListNode implementation.
public class LinkedListNode {

  private final Object value;  //stores a single value

  /** @return The value represented by this node. */
  public @Nonnull Object getValue() {
    return value;
  }

  private LinkedListNode next = null; //links at most to one other node

  /** @return The next node in the list, or {@code null} if this node is the end of the list. */
  public @Nullable LinkedListNode getNext() {
    return next;
  }

  /**
   * Sets the link to next node in the list.
   * @param next The next node in the list, or {@code null} if this node is the end of the list.
   */
  public void setNext(@Nullable final LinkedListNode next) {
    this.next = next;
  }

  /**
   * Creates a node containing an object.
   * @param value The object to store in the node.
   */
  public LinkedListNode(@Nonnull final Object value) {
    this.value = checkNotNull(value);
  }

}

A linked list can thus be a substitution for an array! Here is an abbreviated implementation of a linked list, allowing elements to be added and a length to be calculated.

Example LinkedList implementation.
public class LinkedList {

  private LinkedListNode first = null;

  /**
   * Adds an object to the end of the list.
   * @param object The object to add to the list.
   */
  public void add(final Object object) {
    final LinkedListNode newNode = new LinkedListNode(object);
    if(first == null) {  //if the list is empty
      first = newNode;
      return;
    }
    //look for the end of the list
    LinkedListNode node = first;
    while(node.getNext() !=null) {
      node = node.getNext();
    }
    assert node.getNext() == null;
    //add the node to the end of the list
    node.setNext(newNode);
  }

  /** @return The length of the list. */
  public int getLength() {
    if(first == null) {  //if the list is empty
      return 0;
    }
    //look for the end of the list
    int length=1;
    LinkedListNode node = first;
    while(node.getNext() !=null) {
      node = node.getNext();
      length++;
    }
    return length;
  }
}

Review

Gotchas

In the Real World

Think About It

If you have worked with graph-based structures before, the needed implementation may become obvious to you and you will be able to sit down and code it from the top. But if you are just starting to work with graphs and node traversal, it may all seem overwhelming! That's when you should start with what you know and take it one step at a time. Find a small piece that is so simple that you can't fail to understand it. Implement it, and then make a unit test for it. Then add another step, and another unit test, and keep going until it all works. Let's walk through an example.

Linked List Length

Let's start with finding the length of a linked list. The solution for this can already be found in the main part of this lesson; that solution is iterative. But let's take another approach at how we might think about the problem.

LinkedListNode class diagram.
LinkedListNode class diagram.

First what is a linked list made up of? Nodes. Before trying to implement the whole “kit and caboodle” as it were, let's just think about nodes. In fact don't even think of nodes: think of a single node. You can look at the the implementation of LinkedListNode in the lesson to follow along.

What if we have a single node containing the string "foo" as its value, and containing a next node of null? This is more or less the simplest node we can make. Can it still be considered a linked list? Yes, it can! What is its length? It's length is 1. How did you know that? Because the next node is null. You already knew that, right? So write a method for it.

LinkedList class for initial implementation of static methods.
public class LinkedList {

  /**
   * Determines the length of one or more nodes for a linked list.
   * @param node The first node in the chain.
   * @return The number of nodes strung together.
   */
  static int getLength(@Nonnull final LinkedListNode node) {
    if(node.getNext() == null)) {
      return 1;
    } else {
      throw new UnsupportedOperationException("I don't know how to implement length > 1 yet.");
    }
  }

}

So if LinkedListNode.getNext() == null then there is only one node. That's not hard to do at all! Now you need to test it. Create a unit test LinkedListTest in the same package.

LinkedListTest class for testing the static methods in the LinkedList class.
public class LinkedListTest {

  @Test
  public void testSingleNodeGetLength() {
    final LinkedListNode node = new LinkedListNode("foo");
    assertThat(LinkedList.getLength(node), is(1));
  }

}

At this point you should be on very firm footing. You know that if a node has a next node of null, there are no other nodes so the length is 1. You've written a test for it, and it passed. You might even think that writing a test for such a simple case is ridiculous—but that would be wrong. In fact you could go ahead and write a test for lists longer than 1. There is an entire movement that advocates writing tests before you do the implementation, just so you'll know what you're trying to do! Let's try it for a linked list of two nodes.

Test for length of two LinkedListNodes.
public class LinkedListTest {

  @Test
  public void testSingleNodeGetLength() {
    final LinkedListNode node = new LinkedListNode("foo");
    assertThat(LinkedList.getLength(node), is(1));
  }

  @Test
  public void testTwoNodesGetLength() {
    final LinkedListNode node1 = new LinkedListNode("foo");
    final LinkedListNode node2 = new LinkedListNode("bar");
    node1.setNext(node2);
    assertThat(LinkedList.getLength(node1), is(2));
  }

}

Run the test; it will fail. That's good—you now know what you're trying to implement.

So let's go back to our LinkedList.getLength(LinkedListNode) method. We already know that if LinkedListNode.getNext() is null, then there is only one node in the list. But what if LinkedListNode().getNext() is not null? What do we know? We know that there is one or more nodes in linked to the first node. You can think of these remaining nodes (however many they are) as a linked list themselves. In other words, if you take away the first node, we know that there exists one or more nodes linked together, and the first node in that list is whatever LinkedListNode().getNext() references. If we only knew how long that list was, we could simply add 1 to it, because we've counted the first node:

Planning implementation of length calculation for more than one LinkedListNode.
public class LinkedList {

  static int getLength(@Nonnull final LinkedListNode node) {
    if(node.getNext() == null)) {
      return 1;
    } else {
      /* TODO implement for length > 1
       * The length must be 1 (our current node) plus
       * the length of the linked list starting with "next".
       * Something like  1 + somehowGetLengthOf(next).
       * But how do we calculate the length of "next"?
       */
    }
  }

}

So all you need to do know is find out how to find the length of the linked list starting with the next node. But don't you already have a method for calculating the length of a linked list of nodes starting with some node?

Take a step back and look at your code; you are writing a getLength(LinkedListNode) already! Why can't you call that to get the length of the next node?

“But wait,” you may protest, “that method is not finished yet!” But is it not? Try it:

Recursive implementation of length calculation for more than one LinkedListNode.
public class LinkedList {

  static int getLength(@Nonnull final LinkedListNode node) {
    if(node.getNext() == null)) {
      return 1;
    } else {
      return 1 + getLength(node.getNext());
    }
  }

}

Think about what this method can do now:

And isn't that all it needs to know? Instead of the iterative approach in the lesson, this is in fact the recursive approach. If you have a string of 5 beads, you know that the length of the beads is 1 plus the length of the remaining 4 beads. And the length of 4 beads is 1 plus the length of the remaining 3 beads, and so on down to the last bead, which of course has length of 1. (We always need a way to stop when we are using recursion.)

Now run the unit test LinkedListTest.testTwoNodesGetLength(); it will pass. Write one for three nodes, for five nodes, and for 500 nodes; they will all pass.

We now can calculate the length of a list of one node and anything greater than one node. What about no nodes—an empty list? We can't represent an empty list with a node; we'll have to go to the LinkedList level for that, but we don't have to throw away all our hard work in getLength(LinkedListNode). If the list is not empty, we can delegate to the method we already wrote, like this:

Delegating to static LinkedList.getLength(LinkedListNode) method from LinkedList.getLength().
public class LinkedList {

  private LinkedListNode first;

  …

  public int getLength() {
    if(first == null) {  //if there is not a node... there are no nodes!
      return 0;
    } else {  //if we have at least one node, we already know how to calculate the length!
      return getLength(first);
    }
  }

  static int getLength(@Nonnull final LinkedListNode node) {
    if(node.getNext() == null)) {
      return 1;
    } else {
      return 1 + getLength(node.getNext());
    }
  }

}

So now if you feel lost, you have an approach to attack any of these graph traversal problems, and indeed most problems in software development.

  1. Take the simplest case you can think of.
  2. Start working directly with the nodes; you can work with the higher-level data structure later.
  3. Create some static method to work directly with the simple case.
  4. Create a unit test to test that simple case.
  5. Write another unit test to test something just a little more difficult.
  6. Go back and finish writing the method to handle the more difficult case.
  7. Repeat until finished.

Self Evaluation

Task

Create a new Maven project in the same Git repository to hold the data structures you implement in the two sections below.

  1. Move your existing Booker project files to a separate booker subdirectory. Because you are mostly just moving files, you can commit these changes directly to the repository without creating a merge request. The remaining steps will be part of your merge request for this lesson.
    • You will need to move your .gitignore file as well, if you used the absolute path form for any files or subdirectories.
  2. Create a new subdirectory named datastruct for your new data structure project.
  3. In the datastruct directory create a new Maven Java project using the package my.package.datastruct (substituting your base package name).
    • You will need to create an additional .gitignore file for the new project directory.
  4. Provide appropriate Maven coordinates, using the artifact ID datastruct.
  5. Add the datastruct project coordinates as a dependency to the booker project.

Create your own implementation of a linked list, implementing the following interface, complete with unit tests. Override the default implementation of add(Object object) only if you think it appropriate.

public interface LinkedList {

  /** Returns the first node.
   * @return The first node in the list, or {@code null} if there is no first node.
   * @deprecated Normally a linked list shouldn't expose its nodes; this method
   *     in the interface is temporary to provide access to the search method,
   *     and will be removed when better approaches for searching are added.
   */
  @Deprecated
  @Nullable LinkedListNode getFirstNode();

  /** @return The length of the list. */
  int getLength();

  /**
   * Retrieves the object at the given index.
   * @param index The zero-based index at which to retrieve the object.
   * @return The object at the given index, which may be {@code null}.
   * @throws IndexOutOfBoundsException if the given index is less than zero,
   *     or greater than or equal to the number of elements in the list.
   */
  @Nullable Object get(int index);

  /**
   * Add an item into the list after the current last item.
   * @param object the item to add.
   */
  default void add(@Nullable Object object) {
    insert(getLength(), object);
  }

  /**
   * Inserts an item into the list at a given index.
   * @param index The index at which the object should be inserted,
   *     with a minimum of zero and a maximum of the length of the list.
   * @param object the item to add.
   * @throws IndexOutOfBoundsException if the given index is less than zero,
   *     or greater than the number of elements in the list.
   */
  void insert(int index, @Nullable Object object);

}

Implement the following method in your Booker application and add appropriate unit tests for it. This method should use LinkedList.getFirstNode() and traverse the nodes manually; it should not iterate over the indexes as you would an array.

/**
 * Finds a book in a list by its title.
 * @param list The list of books; it can be assumed that all items in the list are instances of {@link Book}.
 * @param title The title of the book.
 * @return The first book with the given title, or <code>null</code> if no book with that title was found.
 */
public static @Nullable Book findBookByTitle(@Nonnull LinkedList list, @Nonnull String title) {
  …
}

See Also

References

Acknowledgments