Synchronization

Goals

Concepts

Language

Library

Lesson

Threads allow different sequences of program logic to execute concurrently within the process. If the threads are performing completely independent tasks, no problems occur. Many times, however, we use threads to perform the same or similar task more than once concurrently. Sharing resources across threads brings an entirely new level of complexity, as the sequence of resource reading and writing can cause many strange, sporadic, and difficult bugs. Synchronization is the most basic form of controlling resource access among threads, a tool to ensure that only a single thread at a time can access resources crucial to a certain task.

Race Conditions

The central problem that plagues multi-threaded programming is the possibility of a race condition, when two threads tries to access data at the same time. You already learned to use the volatile keyword for variables shared between threads, to ensure that the value is not cached by the JVM. But providing the current value to each thread is only the tip of the iceberg. Once threads start manipulating those values, race conditions appear.

Consider the bank account class presented in the lesson about methods, which for simplicity uses long to represent the monetary balance in cents. (Remember that floating point numbers should never be used to represent monetary values.)

BankAccount.deposit(…) race condition.
public class BankAccount {

  private volatile long balance;

  …

  /** Deposits money to the account.
   * @param amount The amount, in cents, to deposit.
   * @return The new balance after the deposit.
   */
  public long deposit(final long amount) {
    final long oldBalance = balance;
    final long newBalance = oldBalance + amount;
    balance = newBalance;
    return balance;
  }

}

Assume that your bank account currently has a balance of $100.00 (10000L). You receive a birthday present from each of your parents: your mother gives you $20 (2000L) and your father other gives you $30 (3000L). After all the birthday celebrations are over, you expect to have $150 (15000L) in your account.

But what if each of your parents deposit the money at exactly the same? Although using separate tellers at the bank, the computer system uses a central server that allows simultaneous operations using threads. Let's see what might happen if the thread supporting your mother's transaction tries to call BankAccount.deposit(…) at the same time as the thread supporting your father's transaction:

  1. momThread calls deposit(2000L).
  2. momThread executes oldBalance = balance, which places the value 10000L into oldBalance.
  3. momThread executes newBalance = oldBalance + amount, which places 10000L + 2000L or 12000L into newBalance.
  4. At this point dadThread calls deposit(3000L).
  5. dadThread executes oldBalance = balance, which places the value 10000L into oldBalance. Remember that balance has not yet been changed.
  6. dadThread executes newBalance = oldBalance + amount, which places 10000L + 3000L or 13000L into newBalance.
  7. dadThread executes balance = newBalance, which places the value 13000L from newBalance (in dadThread) into balance.
  8. momThread continues and executes balance = newBalance, which places the value 12000L from newBalance (in momThread) into balance.

When both threads have exited BankAccount.deposit(…), the value in balance is 12000L ($120), not the 15000L ($150) you expected. The problem is that the deposit operation consists of several parts that depend on values shared between threads. When these individual parts of the operation were performed at the same time by different threads, a race condition to read and then update the balance value occurs.

Synchronization

The race condition above occurred because we expected the operation in BankAccount.deposit(…) to be atomic. That is, we wanted the calculation and assignment to be completed as one whole operation, not broken down into parts. A non-atomic operation allows another thread to start executing parts the same operation, negating the assumptions of the first operation and creating a race condition.

Java provides the synchronized keyword which, when applied to a method, prevents more than one thread from accessing the method at the same time.

Atomic BankAccount.deposit(…) method using synchronized.
public class BankAccount {

  private long balance;

  …

  /** Deposits money to the account.
   * @param amount The amount, in cents, to deposit.
   */
  public synchronized long deposit(final long amount) {
    final long oldBalance = balance;
    final long newBalance = oldBalance + amount;
    balance = newBalance;
  }

}

Because the method uses synchronized, no two threads can enter the method at the same time. Once momThread enters the method in the sequence above, Java does not allow dadThread to enter the method until momThread has left the method. Because no two threads can enter the method at the same time, the the BankAccount.deposit(…) method is now atomic.

Behind the scenes each object instance in Java has a token called a lock. Before a thread can enter a synchronized method, it must acquire this lock. If another thread currently has the lock, the first thread will block, going into the suspended stated BLOCKED which you learned about previously. The blocked thread will only continue once the other thread has released the lock.

Because instance methods share access to instance variables in the same class instance, making a method atomic may require more than making the method itself synchronized. If other methods potentially change variables used by the atomic method, a race condition arises again.

Second method breaking BankAccount.deposit(…) atomicity.
public class BankAccount {

  …

  /** Adds interest to the current balance.
   * @see #getInterestRate()
   */
  public void addInterest() {
    balance = balance + (balance * getInterestRate());
  }

}

Even if BankAccount.deposit(…) is made synchronized, a race condition now exists, not from multiple threads accessing the same method, but from multiple threads accessing different methods. If another thread, say the monthly billingCycleThread, happens to enter BankAccount.addInterest() at the same time that momThread or dadThread is inside the BankAccount.deposit(…) method, a similar race condition develops. BankAccount.deposit(…) is no longer atomic because another thread can change data used by that method via the BankAccount.addinterest() method. The solution is to make both methods synchronized.

Two synchronized BankAccount methods.
public class BankAccount {

  …

  public synchronized long deposit(final long amount) {…}

  public synchronized void addInterest() {…}

}

Because synchronized refers to the same class-level lock, a thread cannot enter one of the methods if another thread is has already acquired the lock and entered either of the method.

Synchronization Scope

The scope of synchronized can be narrowed by using the keyword on smaller-grained blocks than entire methods. If the BankAccount.deposit(…) performed other actions in addition to making a deposit, those might not need to be part of the atomic operation and could be relegated to outside the synchronized block. Sending an email after making a deposit, for example, would not affect the actual deposit; it would not matter which thread requested an email first, as long as the deposit operation completed atomically.

BankAccount.deposit(…) method using a synchronized block.
public class BankAccount {

  …

  public long deposit(final long amount) {
    final long newBalance;
    synchronized {
      final long oldBalance = balance;
      newBalance = oldBalance + amount;
      balance = newBalance;
    }
    sendBalanceEmail();  //outside of synchronized block
    return newBalance;
  }

}

Synchronizing on a Specific Object

In an effort to reduce the scope of synchronization even further, the synchronized keyword allows you to specify the object from which to acquire the synchronization lock. The BankAccount class might keep a separate variable to keep count of the number of transactions that had occurred. Updating this variable would need to be synchronized or there would be a race condition, just as with updating the balance. The two values are separate, however; operations updating the number of transaction don't interfere with operations modifying the balance. You could create separate objects solely for using their locks to synchronize access separately to the two values.

BankAccount with two synchronized objects.
public class BankAccount {

  private long balance;

  private final Object balanceLock = new Object();

  private long transactionCount = 0;

  private final Object transactionCountLock = new Object();

  public long deposit(final long amount) {
    final long newBalance;
    synchronized(balanceLock) {
      newBalance = (balance += amount);
    }
    synchronized(transactionCountLock) {
      transactionCount++;
    }
  }

}

Deadlock

As soon as you have multiple threads and multiple locks, the risk is raised that your program logic could allow a deadlock to occur. A deadlock situation occurs when two threads are blocking, each waiting for a lock that he other holds. Each thread can only continue once the lock it is waiting for is released, but that can never happen because the other thread is blocking and waiting on the lock held by the first thread. In such a case each thread blocks indefinitely and that section of the program hangs.

Rather than a simple transaction counter, let's keep a historical list of all account actions. We will use a simple ArrayList<Action>, but because ArrayList<E> is not thread safe we must add a locking object for it, too, or simultaneous modification by multiple threads would corrupt the list contents. (We can't simply reuse the balanceLock object, because we want to record actions other than deposits such support calls.)

BankAccount with history.
public class BankAccount {

  private long balance;

  private final Object balanceLock = new Object();

  private final List<Action> history = new ArrayList<Action>();

  public long deposit(final long amount) {
    final long newBalance;
    synchronized(balanceLock) {
      newBalance = (balance += amount);
      synchronized(history) {
        history.add(new DepositAction(amount, balance));
      }
    }
    return newBalance;
  }

  /** Logs a support call with a customer.
   * <p>Includes the current balance in the log.</p>
   * @param repId The identifier of the support representative.
   */
  public void logSupportCall(@Nonnull final String repId) {
    synchronized(history) {
      synchronized(balanceLock) {
        history.add(new SupportAction(repId, balance));
      }
    }
  }
}

Let us assume a situation in which you make a support call regarding your account at the same time that your mother is depositing your birthday present—a completely plausible scenario:

  1. momThread calls deposit(2000L).
  2. momThread gets a lock on balanceLock.
  3. momThread executes balance += amount.
  4. Just then you finish your customer support call and supportThread calls logSupportCall(…).
  5. supportThread gets a lock on historyLock.
  6. supportThread tries to get a lock on balanceLock, but can't because that lock is held by momThread, so supportThread blocks.
  7. momThread continues execution and tries to get a lock on historyLock, but can't because that lock is held by supportThread, so momThread blocks as well.

Now both momThread and supportThread are blocked, each waiting for a lock held by the other. Your birthday present never gets deposited, all because some developer didn't notice the possibility of a race condition.

One solution would be to move retrieval of the balance from outside the scope of the lock on history.
  public void logSupportCall(@Nonnull final String repId) {
    final long balance;
    synchronized(balanceLock) {
      balance = this.balance;  //save the balance locally under a lock
    }
    synchronized(history) {
      history.add(new SupportAction(repId, balance));
    }
  }

Synchronized Collections

Java's collections and maps were a welcome addition that brought abstract data type implementations for general use. The most common implementations such as java.util.ArrayList are similar to (but much more extensive) the data structures you implemented personally as part of these lessons. These implementations and those of the basic Java collection implementations are chock full of race conditions.

Think of what would happen if two threads tried to add an object to a linked list, for example; one thread might read the nextNode reference, but the other thread might update the nextNode reference right out from under the other thread before the other finished, and in the end one of the added objects might be lost because the nextNode got replaced with the wrong value. Java's general collection and map implementations similarly make no provision for thread-safety.

When we added a history list in the example above, we controlled access to the list by synchronizing on the list itself: synchronized(history). This approach works, but the developer must ensure that all access to the list is correctly synchronized. If one day a another developer adds a new method that accesses the history list, forgetting that it should be synchronized, race conditions will occur and data in the list could become corrupt. Likewise if the history list were to be shared between components in the banking system, it would be difficult to guarantee that the other components correctly performed locking.

A more object-oriented approach would be to add synchronization to the list itself. When you first learned about Java collections you learned about the decorator pattern and how you could create an immutable collection by wrapping an existing collection with another class. You also learned that the java.util.Collections class provides utility methods for creating unmodifiable collection wrappers. In the same way the Collections utilities offers several methods for creating synchronized wrappers to collections. The Collections.synchronizedList(List<T> list) method, for example, can take an existing List<T> instance and wrap it in a decorator class that only allows synchronized access to all the list methods.
BankAccount with history using a synchronized list.
public class BankAccount {

  private long balance;

  private final Object balanceLock = new Object();

  private final List<Action> history = Collections.synchronizedList(new ArrayList<Action>());

  public long deposit(final long amount) {
    final long newBalance;
    synchronized(balanceLock) {
      newBalance = (balance += amount);
      history.add(new DepositAction(amount, balance));
    }
    return newBalance;
  }

  /** Logs a support call with a customer.
   * <p>Includes the current balance in the log.</p>
   * @param repId The identifier of the support representative.
   */
  public void logSupportCall(@Nonnull final String repId) {
    synchronized(balanceLock) {
      history.add(new SupportAction(repId, balance));
    }
  }
}

Now there is no need to maintain a separately lock on the history list, because the list itself provides synchronized access to all its methods internally.

Review

Gotchas

In the Real World

Self Evaluation

Task

Your current implementation of FilePublicationRepository is not thread-safe. If used in an environment allowing multiple access to the repository, one thread could be trying to update files while another is trying to read them. Add synchronization to FilePublicationRepository to prevent conflicts when accessing local variables and the file system concurrently via multiple threads. Note that synchronization only affects code within the JVM. It is not possible to prevent concurrent modifications to files in the publication directory by other processes in the operating system, or to prevent the user from manually modifying files while the program is running.

The FakePublicationRepository also needs to be upgraded for concurrent access. You likely used a Java collection type for storing the publications in memory. Add synchronized access to the collection using one of the java.util.Collections methods that creates a synchronized wrapper for a collection. If you use multiple collections in your implementation of FakePublicationRepository, create a synchronized wrapper for one of them and use that as the lock object for synchronizing the others.

See Also

References