It is a widely believed myth in programming language communities that high-performance languages and dynamic languages are completely disjointed sets. The perceived wisdom is that, if you want programmer productivity, you should use a dynamic language, such as Ruby, Python, or R. On the other hand, if you want fast code execution, you should use a statically typed language, such as C or Java.
There are always exceptions to this rule. However, for most mainstream programmers, this is a strongly held belief. This usually manifests itself in what is known as the two-language problem. This is something that is especially prominent in scientific computing. This is the situation where the performance-critical inner kernel is written in C, but is then wrapped and used from a dynamic, higher-level language. Code written in traditional, scientific computing environments such as R, Matlab, or NumPy follows this paradigm.
Code written in this fashion is not without its drawbacks, however. Even though it looks like it gets you the best of both worlds—fast computation, while allowing the programmer to use a high-level language—this is a path full of hidden dangers. For one, someone will have to write the low-level kernel. So, you need two different skill sets. If you are lucky enough to find the low-level code in C for your project, you are fine. However, if you are doing anything new or original, or even slightly different from the norm, you will find yourself writing both C and a high-level language. This will severely limit the number of contributors that your projects or research will get: to be really productive, those contributors really have to be familiar with two languages.
Secondly, when running code routinely written in two languages, there can be severe and unforeseen performance pitfalls. When you can drop down to C code quickly, everything is fine. However, if, for time reasons, effort, skill or changing requirements, you cannot write a performance-intensive part of your algorithm in C, you'll find your program taking hundreds or even thousands of times longer than you expected.
Julia is the first modern language to make a reasonable effort to solve the two-language problem. It is a high-level, dynamic language with powerful features that make for very productive programming. At the same time, code written in Julia usually runs very quickly, almost as quickly as code written in statically typed languages.
The rest of this chapter describes some of the underlying design decisions that make Julia such a fast language. We'll also look at some evidence of the performance claims about Julia. The rest of the book shows you how to write your Julia programs to be as fast and lean as possible. We will discuss how to measure and reason about performance in Julia, and how to avoid some potential performance roadblocks.
For all the content in this book, we will usually illustrate our points with small, self-contained programs. We hope that this will enable you grasp the crux of the issue, without getting distracted by unnecessary elements of a larger program. We expect that this methodology will therefore provide you with instinctive intuition about Julia's performance profile.
Julia has a refreshingly simple performance model—thus, writing fast Julia code is a matter of understanding a few key elements of computer architecture, and how the Julia compiler interacts with it. We hope that, by the end of this book, your instincts are developed well enough to design and write your own Julia code with the fastest possible performance.
Finally, Julia will work for you at both ends of the compute spectrum. On one hand, its performance and expressiveness allows it to run embedded use cases on low-powered processors and it is fully supported on ARM processors, and works well on the Raspberry Pi, which makes it a perfect environment for teaching programming. At the other end of the spectrum, Julia has been used to run large-scale machine learning applications on some of the world's largest super-computers. The Celeste project used Julia Build and Atlas of the Sky, where the computation ran at an amazing 1.5 petaflops (1 petaflop is 10^15 floating point operations per second, or a thousand million million), using 1.3 million threads. This was the first time any dynamic language had broken the petaflop barrier. So, Julia can run on machines large and small, scaling massively in both directions.
The code and examples in this book are targeted at version 1.2 of the language, which is the most recently released version at the time of publication. Since there will be no breaking changes in the 1.x series of Julia, most of the code in this book should work on version 1.0 onward, which was released in August of 2018.