Tips and tricks for designing concurrent algorithms

In this section, we have compiled some tips and tricks you have to keep in mind to design good concurrent applications.

Identifying the correct independent tasks

You can only execute concurrent tasks that are independent of each other. If you have two or more tasks with an order dependency between them, maybe it doesn't interest you to try to execute them concurrently and include a synchronization mechanism to guarantee the execution order. The tasks will execute in a sequential way, and you will have to overcome the synchronization mechanism. A different situation is when you have a task with some prerequisites, but these prerequisites are independent of each other. In this case, you can execute the prerequisites concurrently and then use a synchronization class to control the execution of the task after you finish all the prerequisites.

Another situation where you can't use concurrency is when you have a loop, and all the steps use data generated in the step before, or there is some status information that goes from one step to the next step.

Implementing concurrency at the highest possible level

Rich threading APIs, such as the Java Concurrency API, offer you different classes to implement concurrency in your applications. In the case of Java, you can control the creation and synchronization of threads using the Thread or Lock classes, but it also offers you high-level concurrency objects, such as executors or the fork/join framework, that allow you to execute concurrent tasks. This high-level mechanism offers you the following benefits:

You don't have to worry about the creation and management of threads. You only create tasks and send it to execute. The Java Concurrency API controls the creation and management of threads for you.
They are optimized to give better performance than using threads directly. For example, they use a pool of threads to reuse and avoid thread creation for every task. You can implement these mechanisms from scratch, but it will take you a lot of time, and it will be a complex task.
They include advanced features that make the API more powerful. For example, with executors in Java, you can execute tasks that return a result in the form of a Future object. Again, you can implement these mechanisms from scratch, but it's not advisable.
Your application will be migrated more easily from one operating system to another, and it will be more scalable.
Your application might become faster in future Java versions. Java developers constantly improve the internals, and JVM optimizations will be likely more tailored for JDK APIs.

In summary, for performance and development time reasons, analyze the high-level mechanisms your thread API offers you before implementing your concurrent algorithm.

Taking scalability into account

One of the main objectives, when you implement a concurrent algorithm, is to take advantage of all the resources of your computer, especially the number of processors or cores. But this number may change over time. Hardware is constantly evolving and its cost becomes lower each year.

When you design a concurrent algorithm using data decomposition, don't presuppose the number of cores or processors that your application will execute on. Get the information of the system dynamically (for example, in Java, you can get it with the method Runtime.getRuntime().availableProcessors()) and make your algorithm use that information to calculate the number of tasks it's going to execute. This process will have an overhead over the execution time of your algorithm, but your algorithm will be more scalable.

If you design a concurrent algorithm using task decomposition, the situation can be more difficult. You depend on the number of independent tasks you have in the algorithm and forcing a greater number of tasks will increment the overhead introduced by synchronization mechanisms and the global performance of the application can be even worse. Analyze in detail the algorithm to determine whether you can have a dynamic number of tasks or not.

Using thread-safe APIs

If you need to use a Java library in a concurrent application, read its documentation first to know whether it's thread-safe or not. If it's thread-safe, you can use it in your application without any problem. If it's not, you have the following two options:

If a thread-safe alternative exists, you should use it
If a thread-safe alternative doesn't exist, you should add the necessary synchronization to avoid all possible problematic situations, especially data race conditions

For example, if you need a List in a concurrent application, you should not use the ArrayList class if you are going to update it from several threads, because it's not thread-safe. In this case, you can use a thread-safe class such as ConcurrentLinkedDeque,CopyOnWriteArrayList, or LinkedBlockingDeque. If the class you want to use is not thread-safe, first you must look for the thread-safe alternative. It will probably be more optimized to work with concurrency than any alternative that you can implement.

Never assume an execution order

The execution of tasks in a concurrent application when you don't use any synchronization mechanisms is nondeterministic. The order of execution of the tasks and the time each task is in execution is determined by the scheduler of the operating system. It doesn't care if you observe that the execution order is the same in a number of executions. The next one could be different.

The result of this assumption used to be a data race problem. The final result of your algorithm depends on the execution order of the tasks. Sometimes, the result can be correct, but at other times, it can be incorrect. It can be very difficult to detect the cause of data race conditions, so you must be careful not to forget all the necessary synchronization elements.

Preferring local thread variables over static and shared when possible

Thread local variables are a special kind of variables. Every task will have an independent value for this variable, so you don't need any synchronization mechanisms to protect the access to this variable.

This can sound a little strange. Every object has its own copy of the attributes of the class, so why do we need the thread local variables? Consider this situation. You create a Runnable task and you want to execute multiple instances of that task. You can create a Runnable object per thread you want to execute, but another option is to create a Runnable object and use that object to create all the threads. In the last case, all the threads will have access to the same copy of the attributes of the class, except if you use the ThreadLocal class. The ThreadLocal class guarantees you that every thread will access its own instance of the variable without the use of a Lock, a semaphore, or a similar class.

Another situation when you can take advantage of the Thread local variables is with static attributes. All instances of a class share static attributes, except you declare them with the ThreadLocal class. In this case, every thread will have access to its own copy.

Another option you have is to use something like ConcurrentHashMap<Thread, MyType> and use it like var.get(Thread.currentThread()) or var.put(Thread.currentThread(), newValue). Usually, this approach is significantly slower than ThreadLocal because of possible contention (ThreadLocal has no contention at all). It has an advantage though: you can clear the map completely and the value will disappear for every thread, thus, sometimes it's useful to use such an approach.

Finding the easier parallelizable version of the algorithm

We can define an algorithm as a sequence of steps to solve a problem. There are different ways to solve the same problem. Some are faster, some use less resources, and others fit better with special characteristics of the input data. For example, if you want to order a set of numbers, you can use one of the multiple sorting algorithms that have been implemented.

In a previous section of this chapter, we recommended you use a sequential algorithm as the starting point to implement a concurrent algorithm. There are two main advantages to this approach:

You can easily test the correctness of the results of your parallel algorithm
You can measure the improvement in performance obtained with the use of concurrency

But not every algorithm can be parallelized, at least not so easily. You might think that the best starting point would be the sequential algorithm with best performance solving the problem you want to parallelize, but this can be an incorrect assumption. You should look for an algorithm than can be easily parallelized. Then, you can compare the concurrent algorithm with the sequential one with best performance to see which of those offers the best throughput.

Using immutable objects when possible

One of the main problems you can have in a concurrent application is a data race condition. As we explained before, this happens when two or more tasks can modify the data stored in a shared variable and the access to that variable is not implemented inside a critical section.

For example, when you work with an object-oriented language such as Java, you implement your application as a collection of objects. Each object has a number of attributes and some methods to read and change the values of the attributes. If some tasks share an object and call to a method to change a value of an attribute of that object and that method is not protected by a synchronization mechanism, you will probably have data inconsistency problems.

There are special kinds of objects, called immutable objects. Its main characteristic is that you can't modify any of its attributes after its initialization. If you want to modify the value of an attribute, you must create another object. The String class in Java is the best example of immutable objects. When you use an operator (for example, = or +=) that we might think changes the value of a String, you are really creating a new object.

The use of immutable objects in a concurrent application has two very important advantages:

You don't need any synchronization mechanisms to protect the methods of these classes. If two tasks want to modify the same object, they will create new objects, so two tasks modifying the same object at a time will never occur.
You won't have any data inconsistency problems, as a conclusion of the first point.

There is a drawback with immutable objects. You can create too many objects, and this may affect the throughput and the use of memory of the application. If you have a simple object without internal data structures, it's usually not a problem to make it immutable. However, making complex objects, which incorporate collections of other objects, immutable usually leads to serious performance problems.

Avoiding deadlocks by ordering the locks

One of the best mechanisms to avoid a deadlock situation in a concurrent application is to force tasks to always, get shared resources in the same order. An easy way to do this is to assign a number to every resource. When a task needs more than one resource, it has to request them in order.

For example, if you have two tasks, T1 and T2, and both need two resources, R1 and R2, you can force both to request the R1 resource first, and then the R2 resource. You will never have a deadlock.

On the other hand, if T1 first requests R1 and then R2, and T2 first requests R2 and then R1, you can have a deadlock.

For example, a bad use of this tip is as follows. You have two tasks that need to get two Lock objects. They try to get the locks in a different order:

public void operation1() { 
  lock1.lock(); 
  lock2.lock(); 
     . 
} 
public void operation2() { 
  lock2.lock(); 
  lock1.lock(); 
      
}

It's possible that operation1() executes its first sentence and operation2() its first sentence too, so they will be waiting for the other Lock and you will have a deadlock.

You can avoid this simply by getting the locks in the same order. If you change operation2(), you will never have a deadlock, as follows:

public void operation2() { 
  lock1.lock(); 
  lock2.lock(); 
      
}

Using atomic variables instead of synchronization

When you have to share data between two or more tasks, you have to use a synchronization mechanism to protect access to that data and avoid any data inconsistency problems.

Under some circumstances, you can use the volatile keyword and not use a synchronization mechanism. If only one of the tasks modifies the data and the rest of the tasks read it, you can use the volatile keyword without any synchronization or data inconsistency problems. In other scenarios, you need to use a lock, the synchronized keyword, or any other synchronization method.

In Java 5, the concurrency API included a new kind of variables, denominated atomic variables. These variables are classes that support atomic operations on single variables. They include a method, denominated by compareAndSet(oldValue, newValue) that includes a mechanism to detect, if the assignment of the new value to the variable is done in one step. If the value of the variable is equal to oldValue, it changes it to newValue and returns true. Else, it returns false. There are more methods that work in a similar way, such as getAndIncrement() or getAndDecrement(). These methods are also atomic.

This solution is lock-free; that is to say, it doesn't use locks or any synchronization mechanisms, so its performance is better than any synchronized solution.

The most important atomic variables that you can use in Java are:

AtomicInteger
AtomicLong
AtomicReference
AtomicBoolean
LongAdder
DoubleAdder

Holding locks for as short a time as possible

Locks, like any other synchronization mechanism, allow you to define a critical section that only one task can execute at a time. While a task is executing the critical section, the other tasks that want to execute it are blocked and have to wait for the liberation of the critical section. The application is working in a sequential way.

You have to pay special attention to the instructions you include in your critical sections because you can degrade the performance of your application without realizing it. You must make your critical section as small as possible, and it must include only the instructions that work on shared data with other tasks, so the time that the application is executing in a sequential way would be minimal.

Avoid executing the code you don't control inside the critical section. For example, you are writing a library that accepts a user-defined Callable, which you need to launch sometimes. You don't know exactly what will be in that Callable. Maybe it blocks input/output, acquires some locks, calls other methods of your library, or just works for a very long time. Thus, whenever possible, try to execute it when your library does not hold any locks. If it's impossible for your algorithm, specify this behavior in your library documentation and possibly specify the limitations to the user-supplied code (for example, it should not take any locks). A good example of such documentation can be found in the compute() method of the ConcurrentHashMap class.

Taking precautions using lazy initialization

Lazy initialization is a mechanism that delays object creation until they are used in the application for the first time. It has the main advantage of minimizing the use of memory because you only create the objects that are really needed, but it can be a problem in concurrent applications.

If you have a method that initializes an object and this method is called by two different tasks at once, you can initialize two different objects. This, for example, can be a problem with singleton classes, because you only want to create one object of those classes.

An elegant solution to this problem has been an implemented by the initialization-on-demand holder idiom (https://en.wikipedia.org/wiki/Initialization-on-demand_holder_idiom).

Avoiding the use of blocking operations inside a critical section

Blocking operations are those operations that block the tasks that call them until an event occurs. For example, when you read data from a file or write data to the console, the task that calls these operations must wait until they finish.

If you include one of these operations in a critical section, you are degrading the performance of your application because none of the tasks that want to execute that critical section can execute it. The one that is inside the critical section is waiting for the finalization of an I/O operation, and the others are waiting for the critical section.

Unless it is imperative, don't include blocking operations inside a critical section.