Graphs
Goals
- Understand and identify data structures.
- Understand node and graph data structures.
- Implement and use a linked list.
Concepts
- array
- data structure
- directed graph
- doubly-linked list
- edge
- graph
- instance graph
- linked list
- node
- traverse
- vertex
Library
Lesson
When you learned about algorithms, you discovered that certain of them require you to first organize data in a certain way. A binary search, for example, requires that all input data first be sorted. Not only the order but sometimes even the general organization of data in the computer memory can be important or even necessary for an algorithm to work.
Data Structures
Until this point, whenever you need groups of objects, you have been storing them in arrays. In computer science many other approaches have been invented for holding groups of objects in addition to arrays. These approaches are called data structures, and they serve as general-purpose containers for data. Many times an algorithm will require that data be stored in a particular data structure in order for the algorithm to work. Data structures generally the same way across all programming languages, and they each have their own benefits and trade-offs.
Array
The Java “array objects” you have been using are objects like any non-primitive value. You may have given no thought about how actual values are stored inside the array object, but conceptually the values are stored in a row, in contiguous memory positions. In data structure terms, this is the actual “array”. You could say that the Java array object stores its values in an array data structure.
| Index | 0 | 1 | 2 | 3 | … n-1 | 
|---|---|---|---|---|---|
| Value | element_1 | element_2 | element_3 | element_4 | element_n | 
An array data structure is rather rigid. It has a fixed length, and to add new elements we would somehow have to allocate more memory and/or move the array somewhere else in memory. Inserting an element into the array requires shifting all the other elements down to make room. But perhaps one of the biggest drawbacks is that finding an object in an array means that we have to look through all the elements of the array, one at a time. We could get lucky and find an object toward the front of the array. If we were not so lucky we might not find the object until we reached the end of the array.
Graph
A solution to many of the drawbacks of arrays is to link the objects together in memory, much like a TinkerToy® set. Linking objects together creates a data structure called a graph.
In graph terminology, each object above is called a vertex of the graph, and the lines between the objects are called edges. Graphs can come in all shapes an sizes. A graph could consist of several vertices strung out in a single line. A graph could consist of several vertices each connected to all the other vertices. A graph could even consist of a single vertex not connected to anything. The key concept of a graph is that a vertex has a way to link to one or more other vertices.
Node
Node class diagram.But how can we link our objects in real life? If we have three Animal instances, for example, how could we link them together? One solution would be to add a nextAnimal variable to the Animal class. If we wanted to link together several animals, we could use animal1.setNextAnimal(animal2), and use animal2.setNextAnimal(animal3), and so on—sort of like a row of dogs on leashes, each leashed to the other. But do we really want to pollute our Animal class with special linking variables just because we want a way to group our animals together? This approach would also come with some restrictions: it would be difficult for an Animal to appear in more than one group, for example.
A more flexible approach is to create a separate “wrapper class” for each of the vertexes. Normally this data structure called a node, and its sole purpose is to hold the object and point to one or more other nodes.
Here is conceptually how a Node class might be implemented. In reality the number of nodes would depend on the specific type of graph.
Node implementation.public class Node {
  private final Object value;
  /** @return The value represented by this node. */
  public @Nonnull Object getValue() {
    return value;
  }
  private Node[] links = new Node[0];
  /**
   * Creates a node containing an object.
   * @param value The object to store in the node.
   */
  public Node(@Nonnull final Object value) {
    this.value = checkNotNull(value);
  }
  /**
   * Sets the links to other nodes.
   * @param links The other nodes this node links to.
   */
  public void setLinks(@Nonnull final Node... links)
  {
    this.links = checkNotNull(links);
  }
}We could then construct the graph pictured above:
Node instances.final Object object1 = new Object();  //sample objects; could be any objects
final Object object2 = new Object();
final Object object3 = new Object();
final Object object4 = new Object();
final Object object5 = new Object();
final Node node1 = new Node(object1);  //create the nodes
final Node node2 = new Node(object2);
final Node node3 = new Node(object3);
final Node node4 = new Node(object4);
final Node node5 = new Node(object5);
node1.setLinks(node3, node2, node5);  //link the nodes
node2.setLinks(node3);
node3.setLinks(node4, node5);
node5.setLinks(node3);Graphs in general can become quite complex and can help solve complicated problems. But for merely storing objects there are a few main graph types that are relatively simple.
Linked List
A linked list data structure is a graph consisting of a single line of nodes strung together.
Because each node can only connect to one node in front of it (or none at all, if it is the last node), a LinkedListNode for a linked list can be simpler than the more general Node source code shown in the earlier section above.
LinkedListNode implementation.public class LinkedListNode {
  private final Object value;  //stores a single value
  /** @return The value represented by this node. */
  public @Nonnull Object getValue() {
    return value;
  }
  private LinkedListNode next = null; //links at most to one other node
  /** @return The next node in the list, or {@code null} if this node is the end of the list. */
  public @Nullable LinkedListNode getNext() {
    return next;
  }
  /**
   * Sets the link to next node in the list.
   * @param next The next node in the list, or {@code null} if this node is the end of the list.
   */
  public void setNext(@Nullable final LinkedListNode next) {
    this.next = next;
  }
  /**
   * Creates a node containing an object.
   * @param value The object to store in the node.
   */
  public LinkedListNode(@Nonnull final Object value) {
    this.value = checkNotNull(value);
  }
}A linked list can thus be a substitution for an array! Here is an abbreviated implementation of a linked list, allowing elements to be added and a length to be calculated.
LinkedList implementation.public class LinkedList {
  private LinkedListNode first = null;
  /**
   * Adds an object to the end of the list.
   * @param object The object to add to the list.
   */
  public void add(final Object object) {
    final LinkedListNode newNode = new LinkedListNode(object);
    if(first == null) {  //if the list is empty
      first = newNode;
      return;
    }
    //look for the end of the list
    LinkedListNode node = first;
    while(node.getNext() !=null) {
      node = node.getNext();
    }
    assert node.getNext() == null;
    //add the node to the end of the list
    node.setNext(newNode);
  }
  /** @return The length of the list. */
  public int getLength() {
    if(first == null) {  //if the list is empty
      return 0;
    }
    //look for the end of the list
    int length=1;
    LinkedListNode node = first;
    while(node.getNext() !=null) {
      node = node.getNext();
      length++;
    }
    return length;
  }
}Review
Gotchas
- Arrays are always mutable; anyone with access to an array can change its contents.
- Arrays allow very fast access, but take a long time to insert or delete items because the other elements must be copied.
- Linked list allow quick insertion and deletion, but take longer to look up a certain index because the list must be traversed.
In the Real World
- Data structures are the workhorses of computer programs. You will use them day in and day out to shuffle data around.
Think About It
If you have worked with graph-based structures before, the needed implementation may become obvious to you and you will be able to sit down and code it from the top. But if you are just starting to work with graphs and node traversal, it may all seem overwhelming! That's when you should start with what you know and take it one step at a time. Find a small piece that is so simple that you can't fail to understand it. Implement it, and then make a unit test for it. Then add another step, and another unit test, and keep going until it all works. Let's walk through an example.
Linked List Length
Let's start with finding the length of a linked list. The solution for this can already be found in the main part of this lesson; that solution is iterative. But let's take another approach at how we might think about the problem.
LinkedListNode class diagram.First what is a linked list made up of? Nodes. Before trying to implement the whole “kit and caboodle” as it were, let's just think about nodes. In fact don't even think of nodes: think of a single node. You can look at the the implementation of LinkedListNode in the lesson to follow along.
What if we have a single node containing the string 
 This is more or less the simplest node we can make. "foo" as its value, and containing a next node of null?Can it still be considered a linked list?
 Yes, it can! What is its length?
 It's length is 1. How did you know that?
 Because the next node is null. You already knew that, right? So write a method for it.
LinkedList class for initial implementation of static methods.public class LinkedList {
  /**
   * Determines the length of one or more nodes for a linked list.
   * @param node The first node in the chain.
   * @return The number of nodes strung together.
   */
  static int getLength(@Nonnull final LinkedListNode node) {
    if(node.getNext() == null)) {
      return 1;
    } else {
      throw new UnsupportedOperationException("I don't know how to implement length > 1 yet.");
    }
  }
}So if LinkedListNode.getNext() == null then there is only one node. That's not hard to do at all! Now you need to test it. Create a unit test LinkedListTest in the same package.
LinkedListTest class for testing the static methods in the LinkedList class.public class LinkedListTest {
  @Test
  public void testSingleNodeGetLength() {
    final LinkedListNode node = new LinkedListNode("foo");
    assertThat(LinkedList.getLength(node), is(1));
  }
}At this point you should be on very firm footing. You know that if a node has a next node of null, there are no other nodes so the length is 1. You've written a test for it, and it passed. You might even think that writing a test for such a simple case is ridiculous—but that would be wrong. In fact you could go ahead and write a test for lists longer than 1. There is an entire movement that advocates writing tests before you do the implementation, just so you'll know what you're trying to do! Let's try it for a linked list of two nodes.
LinkedListNodes.public class LinkedListTest {
  @Test
  public void testSingleNodeGetLength() {
    final LinkedListNode node = new LinkedListNode("foo");
    assertThat(LinkedList.getLength(node), is(1));
  }
  @Test
  public void testTwoNodesGetLength() {
    final LinkedListNode node1 = new LinkedListNode("foo");
    final LinkedListNode node2 = new LinkedListNode("bar");
    node1.setNext(node2);
    assertThat(LinkedList.getLength(node1), is(2));
  }
}Run the test; it will fail. That's good—you now know what you're trying to implement.
So let's go back to our LinkedList.getLength(LinkedListNode) method. We already know that if LinkedListNode.getNext() is null, then there is only one node in the list. But what if 
 What do we know? We know that there is one or more nodes in linked to the first node. You can think of these remaining nodes (however many they are) as a linked list themselves. In other words, if you take away the first node, we know that there exists one or more nodes linked together, and the first node in that list is whatever LinkedListNode().getNext() is not null?LinkedListNode().getNext() references. If we only knew how long that list was, we could simply add 1 to it, because we've counted the first node:
LinkedListNode.public class LinkedList {
  static int getLength(@Nonnull final LinkedListNode node) {
    if(node.getNext() == null)) {
      return 1;
    } else {
      /* TODO implement for length > 1
       * The length must be 1 (our current node) plus
       * the length of the linked list starting with "next".
       * Something like  1 + somehowGetLengthOf(next).
       * But how do we calculate the length of "next"?
       */
    }
  }
}So all you need to do know is find out how to find the length of the linked list starting with the next node. But don't you already have a method for calculating the length of a linked list of nodes starting with some node?
Take a step back and look at your code; you are writing a getLength(LinkedListNode) already! Why can't you call that to get the length of the next node?
“But wait,” you may protest, “that method is not finished yet!” But is it not? Try it:
LinkedListNode.public class LinkedList {
  static int getLength(@Nonnull final LinkedListNode node) {
    if(node.getNext() == null)) {
      return 1;
    } else {
      return 1 + getLength(node.getNext());
    }
  }
}Think about what this method can do now:
- getLength(LinkedListNode)can find the length of a list with length- 1. This is the base case.
- getLength(LinkedListNode)can find the the length of any list greater than- 1by adding- 1to the length of the nodes in front of it. This is the general case.
And isn't that all it needs to know? Instead of the iterative approach in the lesson, this is in fact the recursive approach. If you have a string of 5 beads, you know that the length of the beads is 1 plus the length of the remaining 4 beads. And the length of 4 beads is 1 plus the length of the remaining 3 beads, and so on down to the last bead, which of course has length of 1. (We always need a way to stop when we are using recursion.)
Now run the unit test LinkedListTest.testTwoNodesGetLength(); it will pass. Write one for three nodes, for five nodes, and for 500 nodes; they will all pass.
We now can calculate the length of a list of one node and anything greater than one node. What about no nodes—an empty list? We can't represent an empty list with a node; we'll have to go to the LinkedList level for that, but we don't have to throw away all our hard work in getLength(LinkedListNode). If the list is not empty, we can delegate to the method we already wrote, like this:
LinkedList.getLength(LinkedListNode) method from LinkedList.getLength().public class LinkedList {
  private LinkedListNode first;
  …
  public int getLength() {
    if(first == null) {  //if there is not a node... there are no nodes!
      return 0;
    } else {  //if we have at least one node, we already know how to calculate the length!
      return getLength(first);
    }
  }
  static int getLength(@Nonnull final LinkedListNode node) {
    if(node.getNext() == null)) {
      return 1;
    } else {
      return 1 + getLength(node.getNext());
    }
  }
}So now if you feel lost, you have an approach to attack any of these graph traversal problems, and indeed most problems in software development.
- Take the simplest case you can think of.
- Start working directly with the nodes; you can work with the higher-level data structure later.
- Create some static method to work directly with the simple case.
- Create a unit test to test that simple case.
- Write another unit test to test something just a little more difficult.
- Go back and finish writing the method to handle the more difficult case.
- Repeat until finished.
Self Evaluation
- Why are data structures needed?
- What is the distinction between a Java array object and an array data structure? How are the two related?
- What are some drawbacks of an array data structure?
- What must be done in order to insert an object into the middle of an array?
- How could you arbitrarily indicate relationships among Fooinstances in a graph without modifying theFooclass?
- In what ways is a linked list better than an array? How might a linked list be less efficient than an array?
Task
Create a new Maven project in the same Git repository to hold the data structures you implement in the two sections below.
- Move your existing Booker project files to a separate bookersubdirectory. Because you are mostly just moving files, you can commit these changes directly to the repository without creating a merge request. The remaining steps will be part of your merge request for this lesson.- You will need to move your .gitignorefile as well, if you used the absolute path form for any files or subdirectories.
 
- You will need to move your 
- Create a new subdirectory named datastructfor your new data structure project.
- In the datastructdirectory create a new Maven Java project using the packagemy.package.datastruct(substituting your base package name).- You will need to create an additional .gitignorefile for the new project directory.
 
- You will need to create an additional 
- Provide appropriate Maven coordinates, using the artifact ID datastruct.
- Add the datastructproject coordinates as a dependency to thebookerproject.
Create your own implementation of a linked list, implementing the following interface, complete with unit tests. Override the default implementation of add(Object object) only if you think it appropriate.
public interface LinkedList {
  /** Returns the first node.
   * @return The first node in the list, or {@code null} if there is no first node.
   * @deprecated Normally a linked list shouldn't expose its nodes; this method
   *     in the interface is temporary to provide access to the search method,
   *     and will be removed when better approaches for searching are added.
   */
  @Deprecated
  @Nullable LinkedListNode getFirstNode();
  /** @return The length of the list. */
  int getLength();
  /**
   * Retrieves the object at the given index.
   * @param index The zero-based index at which to retrieve the object.
   * @return The object at the given index, which may be {@code null}.
   * @throws IndexOutOfBoundsException if the given index is less than zero,
   *     or greater than or equal to the number of elements in the list.
   */
  @Nullable Object get(int index);
  /**
   * Add an item into the list after the current last item.
   * @param object the item to add.
   */
  default void add(@Nullable Object object) {
    insert(getLength(), object);
  }
  /**
   * Inserts an item into the list at a given index.
   * @param index The index at which the object should be inserted,
   *     with a minimum of zero and a maximum of the length of the list.
   * @param object the item to add.
   * @throws IndexOutOfBoundsException if the given index is less than zero,
   *     or greater than the number of elements in the list.
   */
  void insert(int index, @Nullable Object object);
}Implement the following method in your Booker application and add appropriate unit tests for it. This method should use LinkedList.getFirstNode() and traverse the nodes manually; it should not iterate over the indexes as you would an array.
/**
 * Finds a book in a list by its title.
 * @param list The list of books; it can be assumed that all items in the list are instances of {@link Book}.
 * @param title The title of the book.
 * @return The first book with the given title, or <code>null</code> if no book with that title was found.
 */
public static @Nullable Book findBookByTitle(@Nonnull LinkedList list, @Nonnull String title) {
  …
}See Also
References
- Data structure (Wikipedia)
- Dictionary of Algorithms and Data Structures (NIST)
- Graph (Wikipedia)
- Linked list (Wikipedia)
Acknowledgments
- “TINKERTOY” is a trademark of Hasbro.