Tips and tricks to design concurrent algorithms
In this section, we have compiled some tips and tricks you have to keep in mind to design good concurrent applications.
Identify the correct independent tasks
You can only execute concurrent tasks that are independent of each other. If you have two or more tasks with an order dependency between them, maybe you have no interest in trying to execute them concurrently and including a synchronization mechanism to guarantee the execution order. The tasks will execute in a sequential way, and you will have to overcome the synchronization mechanism. A different situation is when you have a task with some prerequisites, but these prerequisites are independent of each other. In this case, you can execute the prerequisites concurrently and then use a synchronization class to control the execution of the task after completion of all the prerequisites.
Another situation where you can't use concurrency is when you have a loop, and all the steps use data generated in the step before or there is some status information that goes from one step to the next step.
Implement concurrency at the highest possible level
Rich threading APIs, as the Java concurrency API, offer you different classes to implement concurrency in your applications. In the case of Java, you can control the creation and synchronization of threads using the Thread
or Lock
classes, but it also offers you high-level concurrency objects, such as executors or the Fork/Join framework that allows you to execute concurrent tasks. This high-level mechanism offers you the following benefits:
- You don't have to worry about the creation and management of threads. You only create tasks and send them for execution. The Java concurrency API controls the creation and management of threads for you.
- They are optimized to give better performance than using threads directly. For example, they use a pool of threads to reuse them and avoid thread creation for every task. You can implement these mechanisms from scratch, but it will take you a lot of time, and it will be a complex task.
- They include advanced features that make the API more powerful. For example, with executors in Java, you can execute tasks that return a result in the form of a
Future
object. Again, you can implement these mechanisms from scratch, but it's not advisable. - Your application will be migrated easier from one operating system to another, and it will be more scalable.
- Your application might become faster in the future Java versions. Java developers constantly improve the internals, and JVM optimizations will be likely more tailored for JDK APIs.
In summary, for performance and development time reasons, analyze the high-level mechanisms your thread API offers you before implementing your concurrent algorithm.
Take scalability into account
One of the main objectives when you implement a concurrent algorithm is to take advantage of all the resources of your computer, especially the number of processors or cores. But this number may change over time. Hardware is constantly evolving and its cost becomes lower each year.
When you design a concurrent algorithm using data decomposition, don't presuppose the number of cores or processors that your application will execute on. Get the information about the system dynamically (for example, in Java you can get it with the method Runtime.getRuntime().availableProcessors()
) and make your algorithm use that information to calculate the number of tasks it's going to execute. This process will have an overhead over the execution time of your algorithm, but your algorithm will be more scalable.
If you design a concurrent algorithm using task decomposition, the situation can be more difficult. You depend on the number of independent tasks you have in the algorithm and forcing a bigger number of tasks will increment the overhead introduced by synchronization mechanisms, and the global performance of the application can be even worse. Analyze in detail the algorithm to determine whether you can have a dynamic number of tasks or not.
Use thread-safe APIs
If you need to use a Java library in a concurrent application, read its documentation first to know if it's thread-safe or not. If it's thread-safe, you can use it in your application without any problem. If it's not, you have the following two options:
- If a thread-safe alternative exists, you should use it
- If a thread-safe alternative doesn't exist, you should add the necessary synchronization to avoid all possible problematic situations, especially data race conditions
For example, if you need a List in a concurrent application, you should not use the ArrayList
class if you are going to update it from several threads because it's not thread-safe. In this case, you can use a thread-safe class such as ConcurrentLinkedDeque,CopyOnWriteArrayList
, or LinkedBlockingDeque
. If the class you want to use is not thread-safe, first you must look for a thread-safe alternative. Probably, it will be more optimized to work with concurrency that any alternative that you can implement.
Never assume an execution order
The execution of tasks in a concurrent application when you don't use any synchronization mechanism is nondeterministic. The order in which the tasks are executed and the time each task is in execution before the processor moves on to another task is determined by the scheduler of the operating system. It doesn't care if you observe that the execution order is the same in a number of executions. The next one could be different.
The result of this assumption used to be a data race problem. The final result of your algorithm depends on the execution order of the tasks. Sometimes, the result can be right, but at other times it can be wrong. It can be very difficult to detect the cause of data race conditions, so you must be careful not to forget all the necessary synchronization elements.
Prefer local thread variables over static and shared when possible
Thread local variables are a special kind of variable. Every task will have an independent value for this variable, so you don't need any synchronization mechanism to protect access to this variable.
This can sound a little strange. Every object has its own copy of the attributes of the class, so why do we need the thread local variables? Consider this situation. You create a Runnable
task, and you want to execute multiple instances of that task. You can create a Runnable
object for each thread you want to execute, but another option is to create a Runnable
object and use that object to create all the threads. In the last case, all the threads will have access to the same copy of the attributes of the class except if you use the ThreadLocal
class. The ThreadLocal
class guarantees you that every thread will access its own instance of the variable without the use of a Lock, a semaphore, or a similar class.
Another situation when you can take advantage of Thread local variables is with static attributes. All instances of a class share the static attributes, but you declare them with the ThreadLocal
class. In this case, every thread will have access to its own copy.
Another option you have is to use something like ConcurrentHashMap<Thread, MyType>
and use it like var.get(Thread.currentThread())
or var.put(Thread.currentThread(), newValue)
. Usually, this approach is significantly slower than ThreadLocal
because of possible contention (ThreadLocal
has no contention at all). It has an advantage though: you can clear the map completely and the value will disappear for every thread; thus, sometimes it's useful to use such an approach.
Find the more easily parallelizable version of the algorithm
We can define an algorithm as a sequence of steps to solve a problem. There are different ways to solve the same problem. Some are faster, some use fewer resources, and others fit better with special characteristics of the input data. For example, if you want to order a set of numbers, you can use one of the multiple sorting algorithms that have been implemented.
In a previous section of this chapter, we recommended you use a sequential algorithm as the starting point to implement a concurrent algorithm. There are two main advantages to this approach:
- You can easily test the correctness of the results of your parallel algorithm
- You can measure the improvement in performance obtained with the use of concurrency
But not every algorithm can be parallelized, at least not so easily. You might think that the best starting point could be the sequential algorithm with the best performance solving the problem you want to parallelize, but this can be a wrong assumption. You should look for an algorithm than can be easily parallelized. Then, you can compare the concurrent algorithm with the sequential one with the best performance to see which offers the best throughput.
Using immutable objects when possible
One of the main problems you can have in a concurrent application is a data race condition. As we explained before, this happens when two or more tasks modify the data stored in a shared variable and access to that variable is not implemented inside a critical section.
For example, when you work with an object-oriented language such as Java, you implement your application as a collection of objects. Each object has a number of attributes and some methods to read and change the values of the attributes. If some tasks share an object and call to a method to change a value of an attribute of that object and that method is not protected by a synchronization mechanism, you probably will have data inconsistency problems.
There are special kinds of object named immutable objects. Their main characteristic is that you can't modify any attributes after initialization. If you want to modify the value of an attribute, you must create another object. The String
class in Java is the best example of immutable objects. When you use an operator (for example, =
or +=
) that we might think changes the value of a String, you are really creating a new object.
The use of immutable objects in a concurrent application has two very important advantages:
- You don't need any synchronization mechanism to protect the methods of these classes. If two tasks want to modify the same object, they will create new objects, so it will never occur that two tasks modify the same object at a time.
- You won't have any data inconsistency problems, as a conclusion of the first point.
There is a drawback with immutable objects. If you create too many objects, this may affect the throughput and memory use of the application. If you have a simple object without internal data structures, it's usually not a problem to make it immutable. However, making immutable complex objects that incorporate collections of other objects usually leads to serious performance problems.
Avoiding deadlocks by ordering the locks
One of the best mechanisms to avoid a deadlock situation in a concurrent application is to force tasks to get shared resources always in the same order. An easy way to do this is to assign a number to every resource. When a task needs more than one resource, it has to request them in order.
For example, if you have two tasks, T1 and T2, and both need two resources, R1 and R2, you can force both to request first the R1 resource and then the R2 resource. You will never have a deadlock.
On the other hand, if T1 first requests R1 and then R2 and T2 first requests R2 and then R1, you can have a deadlock.
For example, a bad use of this tip is as follows. You have two tasks that need to get two Lock
objects. They try to get the locks in different order:
public void operation1() { lock1.lock(); lock2.lock(); …. } public void operation2() { lock2.lock(); lock1.lock(); ….. }
It's possible that operation1()
executes its first sentence and operation2()
its first sentence too, so they will be waiting for the other Lock
and you will have a deadlock.
You can avoid this simply by getting the locks in the same order. If you change operation2()
, you will never have a deadlock as follows:
public void operation2() { lock1.lock(); lock2.lock(); ….. }
Using atomic variables instead of synchronization
When you have to share data between two or more tasks, you have to use a synchronization mechanism to protect the access to that data and avoid any data inconsistency problems.
Under some circumstances, you can use the volatile
keyword and not use a synchronization mechanism. If only one of the tasks modifies the data and the rest of the tasks read it, you can use the volatile keyword without any synchronization or data inconsistency problem. In other scenarios, you need to use a lock, the synchronized keyword, or any other synchronization method.
In Java 5, the concurrency API included a new kind of variable called atomic variables. These variables are classes that support atomic operations on single variables. They include a method, denominated by compareAndSet(oldValue, newValue)
, that includes a mechanism to detect if assigning to the new value to the variable is done in one step. If the value of the variable is equal to oldValue
, it changes it to newValue
and returns true. Otherwise, it returns false
. There are more methods that work in a similar way, such as getAndIncrement()
or getAndDecrement()
. These methods are also atomic.
This solution is lock-free; that is to say, it doesn't use locks or any synchronization mechanism, so its performance is better than any synchronized solution.
The most important atomic variables that you can use in Java are:
AtomicInteger
AtomicLong
AtomicReference
AtomicBoolean
LongAdder
DoubleAdder
Holding locks for as short a time as possible
Locks, as with any other synchronization mechanism, allow you to define a critical section that only one task can execute at a time. While a task is executing the critical section, the other tasks that want to execute it are blocked and have to wait for the liberation of the critical section. The application is working in a sequential way.
You have to pay special attention to the instructions you include in your critical sections because you can degrade the performance of your application without realizing it. You must make your critical section as small as possible, and it must include only the instructions that work on shared data with other tasks, so the time that the application is executing in a sequential way will be minimal.
Avoid executing inside the critical section the code you don't control. For example, you are writing a library that accepts a user-defined Callable
, which you need to launch sometimes. You don't know what exactly will be in that Callable
. Maybe it blocks input/output, acquires some locks, calls other methods of your library, or just works for a very long time. Thus, whenever possible, try to execute it when your library does not hold any locks. If it's impossible for your algorithm, specify this behavior in your library documentation and possibly specify the limitations to the user-supplied code (for example, it should not take any locks). A good example of such documentation can be found in the compute()
method of the ConcurrentHashMap
class.
Taking precautions using lazy initialization
Lazy initialization is a mechanism that delays object creation until the object is used in the application for the first time. Its main advantage is it minimizes the use of memory because you only create the objects that are really needed, but it can be a problem in concurrent applications.
If you have a method that initializes an object and this method is called by two different tasks at once, you can initialize two different objects. This, for example, can be a problem with singleton classes because you only want to create one object of these classes.
A elegant solution to this problem has been implemented, as the Initialization-on-demand holder idiom (https://en.wikipedia.org/wiki/Initialization-on-demand_holder_idiom).
Avoiding the use of blocking operations inside a critical section
Blocking operations are those operations that block the task that calls them until an event occurs. For example, when you read data from a file or write data to the console, the task that calls these operations must wait until they finish.
If you include one of these operations into a critical section, you are degrading the performance of your application because none of the tasks that want to execute that critical section can execute it. The one that is inside the critical section is waiting for the finalization of an I/O operation, and the others are waiting for the critical section.
Unless it is imperative, don't include blocking operations inside a critical section.