The shared memory and shared state model

What if we write a multithreaded program to achieve the same result? A thread of execution is a sequence of programming instructions, scheduled and managed by the operating system. A process could contain multiple threads; in other words, a process is a container for concurrently executing threads, as shown in the following diagram:

As shown in the preceding diagram, multiple threads share the process memory. Two concurrently running processes do not share memory or any other resources, such as file descriptors. In other words, different concurrent processes have their own address space, while multiple threads within the same process share their address space. Each thread also has a stack of its own. This stack is used for returning after a process call. Locally scoped variables are also created on the stack. The relationships between these elements are shown in the following diagram:

As shown in the preceding diagram, both the threads communicate via the process's global memory. There is a FIFO (first in first out) queue in which the producer thread t1 enters the filenames. The consumer thread, t2, picks up the queue entries as and when it can.

What does this data structure do? It works on a similar principle as the aforementioned pipe. The producer can produce items as fast or slow as it can. Likewise, the consumer thread picks the entries from the queue as needed. Both work at their own pace, without caring or knowing of each other.

Exchanging information this way looks simpler. However, it brings with it a host of problems. Access to the shared data structure needs to be synchronized correctly. This is surprisingly hard to achieve. The next few sections will deal with the various issues that crop up. We will also see the various paradigms that shy away from the shared state model and tilt towards message passing.

Threads interleaving – the need for synchronization

The way threads get scheduled to run is within the realm of the operating system. Factors such as the system load, the number of processes running at a time on the machine, make thread scheduling unpredictable. A good example is a seat in a movie hall.

Let's say that the movie that is playing is a big crowd-puller. We need to follow a protocol; we book a ticket by reserving a seat. The following diagram shows the rules that are based around the ticket booking system:

What if the booking is erroneously given to two people at the same time? The result would be chaotic, as both would rightfully try to go and occupy the seat at the same time.

There is a certain framework at work to make sure this situation does not happen in practice. The seat is shared among interested people and one person needs to book it in advance.

Likewise, threads need to lock a resource (that is, a shared, mutable data structure). The problem is with explicit locking. If the onus of correctly synchronizing is with the application, then someone, someday may forget to do it right and all hell will break loose.

To illustrate the need for correct synchronization, the following diagram shows an integer variable shared between two threads:

As shown in the preceding diagram, if the interleaving happens to be right, things may work as expected. Otherwise, we will have a lost update to deal with.

Instead, just like a movie ticket acting as a lock, every Java object has a lock. A thread acquires it, performs the state mutations, and unlocks. This entire sequence is a critical section. If your critical section needs to mutate a set of objects, you need to lock each one separately.

Generally, the advice is to keep the critical section as small as possible. We will discuss the reason in the next section.

Race conditions and heisenbugs

The lost update is an example of a race condition. A race condition means that the correctness of the program (the way it is expected to work) depends on the relative timing of the threads getting scheduled. So sometimes it works right, and sometimes it does not!

This is a situation that is very hard to debug. We need to reproduce a problem to investigate it, possibly running it in a debugger. What makes it hard is that the race condition cannot be reproduced! The sequence of interleaved instructions depends on the relative timing of events that are strongly influenced by the environment. Delays are caused by other running programs, other network traffic, OS scheduling decisions, variations in the processor's clock speed, and so on. A program containing a race condition may exhibit different behavior, at different times.

A heisenbug and race conditions are explained in the diagram:

These are heisenbugs—essentially nondeterministic and hard to reproduce. If we try debugging a heisenbug by attaching a debugger, the bug may disappear!

There is simply no way to debug and fix these. There is some tooling support, such as the tha tool (https://docs.oracle.com/cd/E37069_01/html/E54439/tha-1.html) and helgrind (http://valgrind.org/docs/manual/drd-manual.html); however, these are language or platform specific, and don't necessarily prove the absence of races.

Clearly, we need to avoid race conditions by design, hence the need to study concurrency patterns and this book.

Correct memory visibility and happens-before

There is yet another problem that could come up with incorrect synchronization: incorrect memory visibility. The synchronized keyword prevents the execution of critical sections by more than one thread. The synchronized keyword also makes sure the thread's local memory syncs up correctly with the shared memory, as shown in the following diagram:

What is this local memory? Note that on a multicore CPU, each CPU has a cache for performance reasons. This cache needs to be synced with the main shared memory. The cache coherence needs to be ensured so that each thread running on a CPU has the right view of the shared data.

As shown in the preceding diagram, when a thread exits a synchronized block, it issues a write barrier, thereby syncing the changes in its cache to the shared memory. On the other hand, when a thread enters a synchronized block, it issues a read barrier, so its local cache is updated with the latest changes in the shared memory.

Note that this is again not easy. In fact, very seasoned programmers were tripped up when they proposed the double-checked locking pattern. This seemingly brilliant optimization was found to be flawed in light of the preceding memory synchronization rules.

For more information on this botched optimization attempt, take a look at https://www.javaworld.com/article/2074979/java-concurrency/double-checked-locking--clever--but-broken.html.

However, Java's volatile keyword guarantees correct memory visibility. You don't need to synchronize just to ensure correct visibility. This keyword also guarantees ordering, which is a happens-before relationship. A happens-before relationship ensures that any memory writes done by a statement are visible to another statement, as shown in the following code:

private int i = 0;
private int j = 0;
private volatile boolean k = false;
// first thread sets values 
i = 1;
j = 2;
k = true;

All the variable values will be set to have a happens-before relationship because of the volatile that is being set. This means that after the variable k is set, all the previous changes are guaranteed to have happened! So the value of the i and j variables are guaranteed to be set, as shown in the following snippet:

       // second thread prints them
System.out.println("i = " + i + ", j = " + j + ", k = " + k) // the i and j values will have been flushed to memory

The volatile keyword, however, does not guarantee atomicity. See http://tutorials.jenkov.com/java-concurrency/volatile.html for more information.

Sharing, blocking, and fairness

Just like the process life cycle, threads also have a life cycle. The following figure shows the various thread states. It shows three threads, t1, t2, and t3, in the Runnable, Running, and Timed Wait states. Here is a brief explanation of each state:

New: When a Thread object is just created. The thread is not alive, as yet.
Runnable: When the start() function is called on the thread object, its state is changed to runnable. As shown in the following diagram, a thread scheduler kicks in to decide when to schedule this thread to be run.
Running: Eventually, the thread scheduler picks one of the threads from the runnable thread pool and changes its state to Running. This is when the the thread starts executing. The CPU starts the execution of this thread.
Blocked: The thread is waiting for a monitor lock. As noted previously, for a shared resource such as a mutable memory data structure, only the thread can access/read/mutate it. While a thread has the lock, other threads will be blocked.
Waiting: Wait for another thread to perform an action. Threads commonly block while doing I/O.
Timed Wait: The thread waits for an event for a finite amount of time.
Terminated: The thread is dead and cannot go back to any other state.

A thread goes back to the Runnable state once the event it waited for happens:

As shown in the preceding diagram, a blocking thread is expensive and wasteful. Why is this so? Remember, a thread is a resource itself. Instead of a thread that is just blocked and doing nothing, it would be far more optimal to employ it for processing something else. Wouldn't it be good to allocate the thread to do something useful?

Keeping the critical sections small is one way to be fair to all threads. No thread holds the lock for a long time (although this is can be altered).

Could we avoid blocking the thread and instead use it for something else? Well, that brings us to the theme of asynchronous versus synchronous executions.

Asynchronous versus synchronous executions

Blocking operations are bad, as they waste resources. By blocking, we mean operations that take a long time to complete. Synchronous execution allows tasks to execute in a sequence, waiting for the current operation to complete before starting with the next. For example, making a phone call is synchronous. We dial the number, wait for the person on other side to say "hello," and then proceed with the conversation.

On the other hand, posting a letter is done asynchronously. One does not post a letter and block for its response. We post it and then go our own way, doing other stuff. Some time in the future, we can expect a response (or an error if the letter could not be delivered).

As another example, some restaurants give you a lunch token. You pay for lunch and get a token, which is a promise that you will get to eat in the near future. If there is a big queue at the counter, you may occupy yourself with something else in the meantime and then try again later.

This is an asynchronous model.

Imagine what would happen in a case where there is no token system in place. You pay and then just wait for your turn to be served, blocked while users at the front of the queue are served.

Coming back to the software world, file and network I/O are blocking. So are database calls using blocking drivers, as shown in the following diagram:

Instead of blocking and wasting the thread doing nothing, we could look at the workflow as a mix of unblocking and blocking tasks. We would then handle a blocking task using a future: an abstraction that will complete eventually and call us back with the results or an error.

This is a change in the paradigm, where we start thinking of designing our tasks differently and representing them using higher-level abstractions, such as a future (which we discussed previously), and not deal with the threads directly. Actors are another abstraction over threads, that is, another paradigm.

Futures offer composability. They are monads. You can create a pipeline of future operations to perform higher-level computations, as we will soon see in an upcoming chapter.

Java's nonblocking I/O

Java NIO (New IO) is a nonblocking I/O API for Java. This NIO is an alternative to the standard Java I/O API. It provides abstractions such as channels, buffers, and selectors. The idea is to provide an implementation that can use the most efficient operations provided by the operating system, as shown in the following screenshot:

A channel is just a bidirectional I/O stream. A single thread can monitor all the channels an application has opened. Data arriving at any channel is an event, and the listening thread is notified of its arrival.

The selector uses event notification: a thread can then check whether the I/O is complete without any need for blocking. A single thread can handle multiple concurrent connections.

This translates into two primary benefits:

Overall, you would need fewer threads. As a thread has a memory footprint, the memory management would have less overhead.
Threads could do something useful when there is no I/O. This opens up the possibility of optimization, as threads are a valuable resource.

The Netty framework (https://netty.io/) is an NIO-based client-server framework. The Play framework is a high-performance, reactive web framework based on Netty.

You're reading from Concurrent Patterns and Best Practices Build scalable apps in Java with multithreading, synchronization and functional programming patterns

Table of Contents (9) Chapters