I/O

Goals

Concepts

Language

Library

Lesson

The Evolution of Java I/O

The JDK has been released with various input/output (I/O) libraries over the years. Some aspects of newer libraries replaced older ones. Other aspects continue to coexist with old classes. Here is a quick overview of the evolution of Java I/O over the years, just to get a feel for what has come before and to recognize some of the terminology when you see it.

  1. IO - The original classes in the java.io package concentrated on traditional I/O streams and file random-access classes. Many of these classes can still be used, although newer libraries bring alternatives for special applications. The java.io.IOException exception class still pervades I/O code, but the the java.io.File class which was central to this package should not be abandoned for new code.
  2. NIO - Java introduced the java.nio package (new I/O), which included new whiz-bang concepts such as java.nio.Buffer and channels. These will be discussed in upcoming lessons.
  3. NIO.2 - Java added some new approaches to asynchronous I/O, but most significantly across the board was the introduction of the java.nio.file.Path interface (to replace java.io.File), along with many other classes in the java.nio.file package. Note that Java did not create a new java.nio2 package for these additions; instead the new classes are scattered across packages.

Exceptions

java.io.IOException
A checked exception traditionally used as a general indication of I/O error.
java.io.UncheckedIOException
An unchecked exception representing an I/O error. This class was recently added. Especially useful for code with lambda expressions, as many functional interfaces do not allow for checked exceptions.

File Systems

Computers store persisted information in files on some file system. Different file systems have different aspects such as security, attributes, and case sensitivity. Examples include NTFS (primarily Windows) and ext4 (primarily Linux).

Java represents information about a file system using the java.nio.FileSystem class. To get the default file system use the helper class java.nio.FileSystems method FileSystems.getDefault().

The files on a file system are usually divided into plain files, and directories, which are used to hierarchically groups. A file (or directory) is identified by its path. A path can be an absolute path if it indicates the complete path (from the outermost or root directory) necessary to locate a file; or a relative path if it only indicates the portion of the path necessary to locate the file from some other directory.

Paths are separated into parts by a separator character; on Linux and related systems this is the forward slash / character; on Windows it is the backslash \ character. An earlier part in the path indicates the parent directory of the directory or file later in the path.The special directory names . and .. refer to the current directory and parent directory, respectively.

File path examples.
Path Relative/Absolute Description
. relative Current directory.
.. relative Parent directory.
foobar.txt relative
./foobar.txt relative Same file as above.
foo/example.txt relative
../bar/example.txt relative
C:\foo\bar.txt absolute File system on Windows OS.
/etc/foo/bar.txt absolute File system on Linux OS.

Path

Java provides a versatile interface for identifying files and directories: the java.nio.file.Path class. You can get a Path instance by asking the FileSystem for it using FileSystem.getPath(…). Rather than calling FileSystems.getDefault().getPath(…), you can use the java.nio.file.Paths utility class using the method Paths.get(…).

Various types of paths using the Path class.
//relative paths
final Path path1 = Paths.get("bar.txt");
final Path path2 = Paths.get("foo" + FileSystems.getDefault().getSeparator() + "bar.txt"); //manual construction
final Path path3 = Paths.get("foo", "bar.txt"); //same as path2 but using preferred approach
//files
final Path windowsExample1 = Paths.get("C:\\foo\\bar.txt");
final Path windowsExample2 = Paths.get("C:", "foo", "bar.txt"); //same as windowsExample1
final Path linuxExample = Paths.get("/etc/foo/bar.txt");
//directory
final Path linuxDirectory = Paths.get("/etc/foo/");
Useful Path methods, using directory /foo/bar/ as an example.
Path Method Description Paths.get("/foo/bar/") Returns
Path.getRoot() The root of the path. .getRoot() /
Path.getFileName() The name of the file or directory .getFileName() bar
Path.getNameCount() The number of name elements in the path. .getNameCount() 3
Path.isAbsolute() Whether the path is absolute. .isAbsolute() true
Path.relativize(Path other) Determines the relative path from this path. .relativize("/foo/bar/some/example.txt") some/example.txt
Path.resolve(Path other) Combines this path with a relative path. .resolve("some/example.txt") /foo/bar/some/example.txt
Path.resolve(String other)

Files

For actually working with files on a disk, you can use the utilities in the java.nio.file.Files class. This class contains a wealth of methods, including methods for checking whether a file is readable or writable.

Files.createDirectories(Path dir, FileAttribute<?>... attrs)
Creates a hierarchy of directories if they do not exist. No error is generated if one or more of the directories already exist.
Files.createDirectory(Path dir, FileAttribute<?>... attrs)
Creates a single new directory. An error may be given if the directory already exists.
Files.exists(Path path, LinkOption... options)
Checks to see if a file exists at the path.
Files.isDirectory(Path path, LinkOption... options)
Determines whether a path represents a directory.
Files.list(Path dir)
Returns a Stream<Path> listing all the paths in a directory. Using stream filtering and processing operations, you can easily return a list of only files with certain filenames for example. The stream returned by this method must be closed, or you will leak resources which could eventually crash your application.

Byte Streams

The most fundamental approach to processing I/O in Java relies on specialized classes that allow programs to process information a byte at a time. An input stream allows a program to read a stream of bits from a data source as bytes. An output stream allows a program to write a stream of bits to a data source, one or more bytes at a time.

Input Streams

The following input stream classes are all in the java.io package.

InputStream class diagram.
InputStream class diagram.
InputStream
Abstract class that forms the basis of all input streams.
BufferedInputStream
Provides buffering of other input streams.
ByteArrayInputStream
An input stream to an existing array of bytes.
FileInputStream
An input stream to a file. This class uses the old java.io.File class and should only be used with legacy code.
FilterInputStream
A simple input stream wrapper allowing subclasses to do more processing on data after reading.
DataInputStream
Provides methods to read primitive Java types in a consistent way across platforms.
ObjectInputStream
An input stream that allows deserialization of Java objects and their instance graphs.

Output Streams

The following output stream classes are all in the java.io package.

OutputStream class diagram.
OutputStream class diagram.
OutputStream
Abstract class that forms the basis of all output streams.
BufferedOutputStream
Provides buffering of other output streams.
ByteArrayOutputStream
An output stream to a dynamically managed internal array of bytes. The collected data can later be retrieved using ByteArrayOutputStream.toByteArray().
FileOutputStream
An output stream to a file. This class uses the old java.io.File class and should only be used with legacy code.
FilterOutputStream
A simple output stream wrapper allowing subclasses to do more processing on data before writing.
DataOutputStream
Provides methods to write primitive Java types in a consistent way across platforms.
ObjectOutputStream
An output stream that allows serialization of Java objects and their instance graphs.
PrintStream
An output stream that helps write certain data using methods such as println(). This class does not correctly encode character and strings across platforms; it should not be used unless you have no other option.

Reading Single Bytes

The abstract class java.io.InputStream forms the basis of all byte stream-based input. Its main method is InputStream.read(), which returns eight bits of information (a byte)—but the byte is returned as an int! This is because the special int value -1 is used to indicate that no further bytes are available to be read (the end of the stream has been reached). If a byte value were used, there would be no way to distinguish between a value -1 indicating the end of the stream, and the byte (which is signed) value -1 representing 0b11111111.

The following example shows how to read from an input stream consisting of an existing array of bytes using java.io.ByteArrayInputStream. The first half of the example merely creates a sequence of bytes to serve as the data to read.

Reading individual bytes from an InputStream until the end of the stream is reached.
//create an array with values 0 ... 255 (256 bytes, or 0x100)
final byte[] inputBytes = new byte[0x100];
for(int i = 0; i < inputBytes.length; i++) {
  inputBytes[i] = (byte)i;  //we know the value isn't larger than a byte (0xFF)
}

//create an input stream from the byte array
final InputStream inputStream = new ByteArrayInputStream(bytes);
//read and print each byte until we reach the end of the stream
try {
  int byteValue;
  while((byteValue = inputStream.read()) != -1) {
    System.out.println(byteValue);
  }
} finally {
  inputStream.close();
}

Try-with-Resources

You already know how to use try … finally … to ensure that you close a Closeable resource in the finally {…} clause. Java offers a further enhancement of the try statement: if a class implements java.lang.AutoCloseable (and the Closeable interface extends AutoCloseable, so all input and output streams are candidates), it can be used in a try-with-resources statement. Simply declare and assign the AutoCloseable resource in parenthesis after the try keyword. Java will automatically add, in the compiled code, the equivalent of a finally clause that calls close() on the resource, whether or not the try clause throws an exception. Here is how the above try … finally statement would be rewritten to use try-with-resources:

Reading from an InputStream using try-with-resources.
//create an array with values 0 ... 255 (256 bytes, or 0x100)
final byte[] inputBytes = new byte[0x100];
for(int i = 0; i < inputBytes.length; i++) {
  inputBytes[i] = (byte)i;  //we know the value isn't larger than a byte (0xFF)
}

//create an input stream from the byte array
try(final InputStream inputStream = new ByteArrayInputStream(bytes)) {
  int byteValue;
  while((byteValue = inputStream.read()) != -1) {
    System.out.println(byteValue);
  }
}

Mark and Reset

There may be times you are reading from a stream and decide, oops, I wish I could unread some information, and go back to start reading at some earlier location. The InputStream class has a facility for placing a marker at location to later go back to.

  1. At any time when reading from an input stream, you can call Inputstream.mark(int readlimit) to request the input stream to mark the current location. The readlimit value indicate the maximum number of bytes you might read before wanting to go back to the mark.
  2. If you later call InputStream.reset() you will reset the stream to the marked location, and the next bytes read will be those directly after the marked location—even if you've already read those bytes earlier.

The mark/reset facility therefore provides a way for the input stream to somehow remember any bytes (up to the readlimit you provided) you read after the mark and somehow effectively put them back into the input stream to be read again.

Writing Single Bytes

The complement to InputStream is the java.io.OutputStream. An output stream allows writing of single bytes using OutputStream.write(int b). But moving data between streams using a byte at a time is inefficient; there are much more efficient ways to move data between stream, as explained in the following sections.

Reading and Writing Multiple Bytes

Many times you will want to read and writer larger sections of data by transferring it to and from a buffer, an area of memory designated for transferring the data. InputStream provides an InputStream.read(byte[] b) method that reads bytes into an existing byte array buffer. There always exist the possibility that, for whatever reason, fewer bytes (even 0!) might be read; this method therefore returns an int indicating the number of bytes read. If the method returns -1, it indicates that the end of the stream has been reached.

We can use such a buffer to copy between two streams. OutputStream provides a corresponding OutputStream.write(byte[] b), but this method assumes that the entire buffer is full and that all the bytes shoudl be written. Because the read operation may not have filled the buffer, we must take care to only write the number of bytes that were read each time around. This can be done using the OutputStream.write(byte[] b, int off, int len), which allows the starting offset (in this case 0) and a length (in this case the number of bytes read), the number of bytes to read.

In this example we copy everything from the input stream to a java.io.ByteArrayOutputStream which collects all the bytes, which we then print out, using the ByteArrayOutputStream.toByteArray() method.

Copying from an InputStream to an OutputStream using a buffer.
//create an array with values 0 ... 255 (256 bytes, or 0x100)
final byte[] inputBytes = new byte[0x100];
for(int i = 0; i < inputBytes.length; i++) {
  inputBytes[i] = (byte)i;  //we know the value isn't larger than a byte (0xFF)
}

//create a buffer array for copying up to 16 bytes at a time (an arbitrary value)
final byte[] buffer = new byte[0x10];

//create a destination output stream for the bytes
final ByteArrayOutputStream baos = new ByteArrayOutputStream();

//copy a buffer at a time until we reach the end of the input stream
try {
  try(final InputStream inputStream = new ByteArrayInputStream(inputBytes)) {
    int count;
    while((count = inputStream.read(buffer)) != -1) {  //-1 indicates end of stream
      baos.write(buffer, 0, count);
    }
  }
} finally {
  baos.close();
}

//print out the bytes in the destination stream
final byte[] outputBytes = baos.toByteArray();
for(final byte byteValue : outputBytes) {
  System.out.println(Byte.toUnsignedInt(byteValue)); //print unsigned values
}

File Streams

You can get an input stream for reading from a file, or an output stream for writing to a file, by using the Files.newInputStream(Path path, OpenOption... options) or the Files.newOutputStream(Path path, OpenOption... options) method, respectively. Here's an example of printing out all the bytes in a /etc/foo/bar.txt file.

Reading from a file using an InputStream.
final static Path path = Paths.get("/etc/foo/bar.txt")
try(final InputStream inputStream = Files.newInputStream(path)) {
  int byteValue;
  while((byteValue = inputStream.read()) != -1) {
    System.out.printlnt(Byte.toUnsignedInt(byteValue));
  }
}

Buffered Streams

Java provides java.io.BufferedInputStream and java.io.BufferedOutputStream for converting any input or output stream to a buffered version. These classes make working with relatively slow connections more efficient, because they will read or write blocks of data to an internal buffer in memory. You can still read and write the data a byte at a time, but you will be accessing an internal buffer which is much quicker that reading or writing data a byte at a time with e.g. a hard drive. These classes will transfer the data in blocks to the ultimate destination when needed. Because these classes use the decorator pattern, you can simply wrap an existing stream on the fly. There is no need to close the underlying stream; closing the wrapper stream will close the decorated stream as well.

Buffered reading from a file using a BufferedInputStream.
final static Path path = Paths.get("/etc/foo/bar.txt")
try(final InputStream inputStream = new BufferedInputStream(Files.newInputStream(path))) {
  int byteValue;
  while((byteValue = inputStream.read()) != -1) {
		System.out.printlnt(Byte.toUnsignedInt(byteValue));
  }
}

Review

Summary

Gotchas

In the Real World

Self Evaluation

Task

You are going to create a repository implementation that uses the file system for as its data store, storing publications in individual files. You have not yet learned how to store the individual publications, but prepare for this eventuality by implementing the basic FilePublicationRepository class structure.

See Also

References

Resources

Hex-works
Online hex editor tool.
HxD
Free hex editor and disk editor. (Windows)
Hex Editor Neo
Free hex editor optimized for large files. (Windows)

Acknowledgments