Current computers have multiple cores installed. In this recipe, we explain how to start Julia so that we can utilize them. There are two basic ways you can use multiple cores: via multithreading and multiprocessing (visit https://www.backblaze.com/blog/whats-the-diff-programs-processes-and-threads/ and https://en.wikipedia.org/wiki/Thread_(computing)#Threads_vs._processes, where you can find a basic explanation of the differences between these two approaches). The major difference is that processes have separate state information, whereas multiple threads within a process share process state as well as memory and other resources. Both options are discussed in this recipe.
Setting up Julia to use multiple cores
Getting ready
In order to test how multiprocessing works, prepare two simple files that display a text message in the console. When running parallelization tests, we will see messages generated by those scripts appear asynchronously.
Create a hello.jl file in your working directory, containing the following code:
println("Hello " * join(ARGS, ", "))
And create hello2.jl with the following code:
println("Hello " * join(ARGS, ", "))
sleep(1)
Now, open your favorite terminal to execute the commands.
How to do it...
We will first explain how to start Julia using multiple processes. In the second part of the recipe, we will set up Julia to use multiple threads.
Multiple processes
In order to start several Julia processes, perform the following steps:
- Specify the number of required worker processes using the -p option on Julia startup.
- Then, check the number of workers in Julia by using the nworkers() function from the Distributed package.
- Run the command following $ in your OS shell, then import the Distributed package and write nworkers() while in Julia, and then use exit() to go back to the shell:
$ julia --banner=no -p 2
julia> using Distributed
julia> nworkers()
2
julia> exit()
$
If you want to execute some script on every worker on startup, you can do it using the -L option.
- Run the hello.jl and hello2.jl scripts (the steps to start Julia and exit it are the same as in the preceding steps):
$ julia --banner=no -p auto -L hello.jl
Hello !
From worker 4: Hello !
From worker 5: Hello !
julia> From worker 2: Hello !
From worker 3: Hello !
julia> exit()
$ julia --banner=no -p auto -L hello2.jl
Hello !
From worker 4: Hello !
From worker 5: Hello !
From worker 2: Hello !
From worker 3: Hello !
julia> exit()
$
We can see that when the -L option is passed, then Julia stays in command line after executing the script (as opposed to running a script normally, where we have to pass the -i option to remain in REPL). The difference in behavior between hello.jl and hello2.jl is explained in the How it works... section.
Multiple threads
Julia can be run in a multithreaded mode. This mode is achieved via the JULIA_NUM_THREADSÂ system environment parameter. One should perform the following steps:
- To start Julia with the number of threads equal to the number of cores in your machine, you have to set the environment variable JULIA_NUM_THREADS first
- Check how many threads Julia is using with the Threads.nthreads() function
Running the preceding steps is handled differently on Linux and Windows.
Here is a list of steps to be followed:
- If you are using bash on Linux, run the following commands:
$ export JULIA_NUM_THREADS=`nproc`
$ julia -e "println(Threads.nthreads())"
4
$
- If you are using cmd on Windows, run the following commands:
C:\> set JULIA_NUM_THREADS=%NUMBER_OF_PROCESSORS%
C:\> julia -e "println(Threads.nthreads())"
4
C:\>
Observe that we have not used the -i option in either case, so the process terminated immediately.
How it works...
A switch, -p {N|auto}, tells Julia to spin up N additional worker processes on startup. The auto option in the -p switch starts as many workers as you have cores on your machine, so julia -p auto is equivalent to:
- julia -p `nproc`Â on Linux
- julia -p %NUMBER_OF_PROCESSORS% on Windows
It is important to understand that when you start N workers, where N is greater than 1, then Julia will spin up N+1 processes. You can check it using the nprocs() function—one master process and N worker processes. If N is equal to 1, then only one process is started.
We can see here that hello.jl was executed on the master process and on all of the worker processes. Additionally, observe that the execution was asynchronous. In this case, workers 4 and 5 printed their message before the Julia prompt was printed by the master process, but workers 2 and 3 executed their print method after it. By adding a sleep(1) statement in hello2.jl, we make the master process wait for one second, which is sufficient time for all workers to run their println command.
As you have seen, in order to start Julia with multiple threads, you have to set the environment variable JULIA_NUM_THREADS. It is used by Julia to determine how many threads it should use. This value—in order to have any effect—must be set before Julia is started. This means that you can access it via the ENV["JULIA_NUM_THREADS"] option but changing it when Julia is running will not add or remove threads. Therefore, before running Julia you have to type the following in a terminal session:
- export JULIA_NUM_THREADS=[number of threads]Â on Linux or if you use bash on Windows
- set JULIA_NUM_THREADS=[number of threads]Â on Windows if you use the standard shell
There's more...
You can also add processes after Julia has started using the addprocs function. We are running the following code on Windows with two drives, C: and D:, present. Julia is started in the D:\ directory:
D:\> julia --banner=no -p 2 -L hello2.jl
Hello
From worker 3: Hello
From worker 2: Hello
julia> pwd()
"D:\\"
julia> using Distributed
julia> pmap(i -> (i, myid(), pwd()), 1:nworkers())
2-element Array{Tuple{Int64,Int64,String},1}:
(1, 2, "D:\\")
(2, 3, "D:\\")
julia> cd("C:\\")
julia> pwd()
"C:\\"
julia> addprocs(2)
2-element Array{Int64,1}:
4
5
julia> pmap(i -> (i,myid(),pwd()), 1:nworkers())
4-element Array{Tuple{Int64,Int64,String},1}:
(1, 3, "D:\\")
(2, 2, "D:\\")
(3, 5, "C:\\")
(4, 4, "C:\\")
In particular, we see that each worker has its own working directory, which is initially set to the working directory of the master Julia process when it is started. Also, addprocs does not execute the script that was specified by the -L switch on Julia startup.
Additionally, we can see the simple use of the pmap and myid functions. The first one is a parallelized version of the map function. The second returns the identification number of a process that it is run on.
As we explained earlier, it is not possible to add threads to a running Julia process. The number of threads has to be specified before Julia is started.
Deciding between using multiple processes and multiple threads is not a simple decision. A rule of thumb is to use threads if there is a need for data sharing and frequent communication between tasks running in parallel.
See also
More details about how to work with multiple processes and multiple threads are explained in the Multithreading in Julia and Distributed computing with Julia recipes in Chapter 10, Distributed Computing.