Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Julia 1.0 Programming Cookbook

You're reading from   Julia 1.0 Programming Cookbook Over 100 numerical and distributed computing recipes for your daily data science work?ow

Arrow left icon
Product type Paperback
Published in Nov 2018
Publisher Packt
ISBN-13 9781788998369
Length 460 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (2):
Arrow left icon
Przemysław Szufel Przemysław Szufel
Author Profile Icon Przemysław Szufel
Przemysław Szufel
Bogumił Kamiński Bogumił Kamiński
Author Profile Icon Bogumił Kamiński
Bogumił Kamiński
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Installing and Setting Up Julia 2. Data Structures and Algorithms FREE CHAPTER 3. Data Engineering in Julia 4. Numerical Computing with Julia 5. Variables, Types, and Functions 6. Metaprogramming and Advanced Typing 7. Handling Analytical Data 8. Julia Workflow 9. Data Science 10. Distributed Computing 11. Other Books You May Enjoy

Setting up Julia to use multiple cores

Current computers have multiple cores installed. In this recipe, we explain how to start Julia so that we can utilize them. There are two basic ways you can use multiple cores: via multithreading and multiprocessing (visit https://www.backblaze.com/blog/whats-the-diff-programs-processes-and-threads/ and https://en.wikipedia.org/wiki/Thread_(computing)#Threads_vs._processes, where you can find a basic explanation of the differences between these two approaches). The major difference is that processes have separate state information, whereas multiple threads within a process share process state as well as memory and other resources. Both options are discussed in this recipe.

Getting ready

In order to test how multiprocessing works, prepare two simple files that display a text message in the console. When running parallelization tests, we will see messages generated by those scripts appear asynchronously.

Create a hello.jl file in your working directory, containing the following code:

println("Hello " * join(ARGS, ", "))

And create hello2.jl with the following code:

println("Hello " * join(ARGS, ", "))
sleep(1)
In the GitHub repository for this recipe, you will find the commands.txt file that contains the presented sequence of shell and Julia commands and the hello.jl and hello2.jl files described above.

Now, open your favorite terminal to execute the commands.

How to do it...

We will first explain how to start Julia using multiple processes. In the second part of the recipe, we will set up Julia to use multiple threads.

Multiple processes

In order to start several Julia processes, perform the following steps:

  1. Specify the number of required worker processes using the -p option on Julia startup.
  2. Then, check the number of workers in Julia by using the nworkers() function from the Distributed package.
  3. Run the command following $ in your OS shell, then import the Distributed package and write nworkers() while in Julia, and then use exit() to go back to the shell:
$ julia --banner=no -p 2

julia> using Distributed


julia> nworkers()
2

julia> exit()

$

If you want to execute some script on every worker on startup, you can do it using the -L option.

  1. Run the hello.jl and hello2.jl scripts (the steps to start Julia and exit it are the same as in the preceding steps):
$ julia --banner=no -p auto -L hello.jl
Hello !
From worker 4: Hello !
From worker 5: Hello !
julia> From worker 2: Hello !
From worker 3: Hello !
julia> exit()

$ julia --banner=no -p auto -L hello2.jl
Hello !
From worker 4: Hello !
From worker 5: Hello !
From worker 2: Hello !
From worker 3: Hello !
julia> exit()

$

We can see that when the -L option is passed, then Julia stays in command line after executing the script (as opposed to running a script normally, where we have to pass the -i option to remain in REPL). The difference in behavior between hello.jl and hello2.jl is explained in the How it works... section.

Multiple threads

Julia can be run in a multithreaded mode. This mode is achieved via the JULIA_NUM_THREADS system environment parameter. One should perform the following steps:

  1. To start Julia with the number of threads equal to the number of cores in your machine, you have to set the environment variable JULIA_NUM_THREADS first
  2. Check how many threads Julia is using with the Threads.nthreads() function

Running the preceding steps is handled differently on Linux and Windows.

Here is a list of steps to be followed:

  1. If you are using bash on Linux, run the following commands:
$ export JULIA_NUM_THREADS=`nproc`
$ julia -e "println(Threads.nthreads())"
4
$
  1. If you are using cmd on Windows, run the following commands:
C:\> set JULIA_NUM_THREADS=%NUMBER_OF_PROCESSORS%
C:\> julia -e "println(Threads.nthreads())"
4
C:\>

Observe that we have not used the -i option in either case, so the process terminated immediately.

How it works...

A switch, -p {N|auto}, tells Julia to spin up N additional worker processes on startup. The auto option in the -p switch starts as many workers as you have cores on your machine, so julia -p auto is equivalent to:

  • julia -p `nproc` on Linux
  • julia -p %NUMBER_OF_PROCESSORS% on Windows

It is important to understand that when you start N workers, where N is greater than 1, then Julia will spin up N+1 processes. You can check it using the nprocs() function—one master process and N worker processes. If N is equal to 1, then only one process is started.

We can see here that hello.jl was executed on the master process and on all of the worker processes. Additionally, observe that the execution was asynchronous. In this case, workers 4 and 5 printed their message before the Julia prompt was printed by the master process, but workers 2 and 3 executed their print method after it. By adding a sleep(1) statement in hello2.jlwe make the master process wait for one second, which is sufficient time for all workers to run their println command.

As you have seen, in order to start Julia with multiple threads, you have to set the environment variable JULIA_NUM_THREADS. It is used by Julia to determine how many threads it should use. This value—in order to have any effect—must be set before Julia is started. This means that you can access it via the ENV["JULIA_NUM_THREADS"] option but changing it when Julia is running will not add or remove threads. Therefore, before running Julia you have to type the following in a terminal session:

  • export JULIA_NUM_THREADS=[number of threads] on Linux or if you use bash on Windows
  • set JULIA_NUM_THREADS=[number of threads] on Windows if you use the standard shell

There's more...

You can also add processes after Julia has started using the addprocs function. We are running the following code on Windows with two drives, C: and D:present. Julia is started in the D:\ directory:

D:\> julia --banner=no -p 2 -L hello2.jl
Hello
From worker 3: Hello
From worker 2: Hello
julia> pwd()
"D:\\"

julia> using Distributed

julia> pmap(i -> (i, myid(), pwd()), 1:nworkers())
2-element Array{Tuple{Int64,Int64,String},1}:
(1, 2, "D:\\")
(2, 3, "D:\\")

julia> cd("C:\\")

julia> pwd()
"C:\\"

julia> addprocs(2)
2-element Array{Int64,1}:
4
5

julia> pmap(i -> (i,myid(),pwd()), 1:nworkers())
4-element Array{Tuple{Int64,Int64,String},1}:
(1, 3, "D:\\")
(2, 2, "D:\\")
(3, 5, "C:\\")
(4, 4, "C:\\")

In particular, we see that each worker has its own working directory, which is initially set to the working directory of the master Julia process when it is started. Also, addprocs does not execute the script that was specified by the -L switch on Julia startup.

Additionally, we can see the simple use of the pmap and myid functions. The first one is a parallelized version of the map function. The second returns the identification number of a process that it is run on.

As we explained earlier, it is not possible to add threads to a running Julia process. The number of threads has to be specified before Julia is started.

Deciding between using multiple processes and multiple threads is not a simple decision. A rule of thumb is to use threads if there is a need for data sharing and frequent communication between tasks running in parallel.

See also

More details about how to work with multiple processes and multiple threads are explained in the Multithreading in Julia and Distributed computing with Julia recipes in Chapter 10, Distributed Computing.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image