Serialization

Goal

Concepts

Language

Library

Dependencies

Lesson

The term serialization in general refers to the process of converting an object to a series of bytes for storing so that you can reconstitute the object later. In Java there exists many ways to serialize objects, but many times the term in general refers to the system based on the java.io.Serializable interface. This system uses the java.io.ObjectInputStream and java.io.ObjectOutputStream classes, and comes with a lot of related rules on how serialization occurs.

Serialization Streams

ObjectInputStream and ObjectOutputStream, which you briefly saw in the first lesson on I/O, are the main vehicles used to serialize and deserialize an object instance graph.

Using these streams for serialization is conceptually straightforward, using the ObjectInputStream.readObject() and ObjectOutputStream.writeObject(Object) methods, respectively.

Writing a FooBar to an ObjectOutputStream.
final FooBar fooBar = new FooBar();
fooBar.setFoo("test");
fooBar.setBar(BigInteger.valueOf(123));

final static Path path = Paths.get("/etc/foo/bar.dat")
try(final ObjectOutputStream outputStream = new ObjectOutputStream(new BufferedOutputStream(Files.newOutputStream(path)))) {
  outputStream.writeObject(fooBar);
}
Reading a FooBar from an ObjectInputStream.
final FooBar fooBar;
final static Path path = Paths.get("/etc/foo/bar.dat")
try(final ObjectInputStream inputStream = new ObjectInputStream(new BufferedInputStream(Files.newInputStream(path)))) {
  fooBar = (FooBar)inputStream.readObject();
}
System.out.println(fooBar.getFoo());  //prints "test"
System.out.println(fooBar.getBar());  //prints "123"

Serializable

If reading and writing objects using object streams were all there was to serialization, things would be easy indeed. But there is much more to serialization. For starters, only classes that implement java.io.Serializable can be serialized. This applies not only to the instance you are serializing, but to all the instances in the graph. Otherwise, a java.io.NotSerializableException will be thrown.

serialVersionUID

Classes can change over the course of development, and even after you're released a version of your product. If you serialize one version of a class and try to deserialize it as another, the serialized data may not be compatible. To prevent reading incompatible data, Java generates a serialVersionUID static variable for each serializable class. When deserializing an object, the JVM compares the stored serialVersionUID to the version of the class to be instantiated. If they don't match, Java will throw an exception.

The problem is that almost any change in the class (even method signatures, for example) will cause Java to change the generated serialVersionUID when the class is compiled. This could mean that you suddenly can't load data you saved earlier just because you tweaked the class. To prevent this Java allows you to maintain the serialVersionUID yourself. Just declare it as static final long and give it any value you want.

Declaring a customer serial version UID.
private static final long serialVersionUID = 123L;

transient

By default Java will store all the members of a serializable class. There may be some variables that you don't want serialized; you can mark those with transient, and they will be ignored and not stored in the output stream.

For example consider a Person class that has givenName and familyName fields. It may contain a read-made constant named fullName that keeps the precomposed full name around in case it is needed. There is no reason to serialize this data—it duplicates information in the other variables, and could be recalculated after deserialization—so we can mark it as transient.

Using the transient keyword.
public class Person implements Serializable {
  private final String givenName;
  private final String familyName;
  private transient String fullName;  //not final in order to reconstitute value after deserialization

  public Person(@Nonnull final String givenName, @Nonnull final String familytName) {
    this.givenName = checkNotNull(givenName);
    this.familyName = checkNotNull(familyName);
    this.fullName = this.givenName + ' ' + this.familyName;
  }
}

Custom Serialization

If you want to take complete control over how an object is serialized or deserialized, you can implement one of the following methods:

Special  methods for controlling serialization and deserialization.
private void writeObject(ObjectOutputStream out) throws IOException;
private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException;

If you don't intend to completely replace the bytes used in serialization, the special methods ObjectOutputStream.defaultWriteObject() and ObjectInputStream.defaultReadObject() may be used to write or read the default version of the object (the bytes that serialization would have written or read by default). Here's how you would make sure the Person.fullName variable gets updated upon deserialization if you have marked it as transient:

Reconstituting transient data after reading.
public class Person implements Serializable {
  private final String givenName;
  private final String familyName;
  private transient String fullName;  //not final in order to reconstitute value after deserializationprivate void readObject(final ObjectInputStream in) throws IOException, ClassNotFoundException {
    in.defaultReadObject();  //deserialize the object normally
    this.fullName = this.givenName + ' ' + this.familyName;  //reconstitute the transient variable
  }
}

Deserializing Alternate Objects

Java recognizes two other magic methods that allow you to completely replace the object being read or written with one of your choosing:

Special  methods for controlling serialization and deserialization.
private Object writeReplace() throws ObjectStreamException;
private Object readResolve() throws ObjectStreamException;

One of the most common uses of readResolve() is to accomplish deserialization of a singleton, a type for which you only want at most one instance. The default serialization mechanism would create a different instance of each object as it is deserialized, but with readResolve() you can take over the process at the last minute and return the singleton instance instead.

If we were to implement a Farm, we could make the Animal interface serializable. We could then write and read as many instances of, for example, a Duck as there exist ducks on our farm. But the Unicorn is a special, magical best; there only exists one unicorn and is the same unicorn that appears on our farm and in fact on all the farms on the JVM. To automatically create a singleton Unicorn instance, we create a static final INSTANCE constant. Whenever a Unicorn is read, instead of letting the serialization mechanism create a new instance we instead return the singleton instance inside readResolve().

Ensuring a singleton Unicorn instance using readResolve().
public class Unicorn implements Animal { //Animal is Serializable

  /** The singleton instance. */
  public static final Unicorn INSTANCE = new Unicorn();

  /** No one else can call the constructor. */
  private Unicorn() {
  }

  …

  private Object readResolve() throws ObjectStreamException {
    return INSTANCE;  //ignore the Unicorn actually read; return the singleton
  }
}

Review

Gotchas

In the Real World

Think About It

Self Evaluation

Task

Improve your FilePublicationRepository implementation so that it actually saves and loads all the publications. There are several ways to approach this. We aren't so concerned about performance here, so we could forego any caching and deal directly with the file system. But we have several lookup methods that search for attributes besides the publication title, such as lookup by type. We therefore choose to load all the publications at the beginning and cache them.

Add a new command load-snapshot to the command-line interface of Booker, which will copy all the snapshot list of publications into the current repository. This is easily accomplished by making a utility method that iterates over the publications in a one PublicationRepository and adds them to another. This would be a good place to show off your mastery of streams and lambda expressions.

Option Alias Description
list Lists all available publications.
load-snapshot Loads the snapshot list of publications into the current repository.
--help -h Prints out a help summary of available switches.
--name -n Indicates a filter by name for the list command.
--type -t Indicates the type of publication to list, either book or periodical. If not present, all publications will be listed.

See Also

References

Acknowledgments