" Rust is technology from the past came to save the future from itself. "
- Graydon Hoare
Rust is a fast, concurrent, safe, and empowering programming language originally started and developed by Graydon Hoare in 2006. It's now an open source language that's developed mainly by a team from Mozilla with collaboration from lots of open source folks. The first stable version, 1.0, was released in May 2015. The project began with the hope of mitigating memory safety issues that came up in gecko with the use of C++. Gecko is the browser engine that's used in Mozilla's Firefox browser. C++ is not an easy language to tame and has concurrency abstractions that can be easily misused. With gecko using C++, a couple of attempts were made (in 2009 and 2011) to parallelize its cascading style sheets (CSS) parsing code to leverage modern parallel CPUs. They failed, as the concurrent C++ code was too hard to maintain and reason about. With a large number of developers collaborating on the mammoth code base that gecko has, writing concurrent code with C++ is not a joyride. In the hope of incrementally removing the painful parts of C++, Rust was born and, with it, Servo, a new research project of creating a browser engine from scratch was initiated. The Servo project provides feedback to the language team by using the bleeding edge language features that, in turn, influences the evolution of the language. Around November 2017, parts of the Servo project, particularly the stylo project (a parallel CSS parser in Rust) started shipping to the latest Firefox release (Project Quantum), which is a great feat in such a short amount of time. Servo's end goal is to incrementally replace components in gecko with its components.
Rust is inspired by a multitude of languages, the notable ones being Cyclone (a safe dialect of C language) for its ideas on region-based memory management techniques; C++ for its RAII principle, and Haskell for its type system, error handling types, and typeclasses.
RAII stands for Resource Acquisition Is Initialization, a paradigm suggesting that resources must be acquired during the initialization of an object and must be released when their destructors are called or when they are deallocated.
The language has a very minimal runtime, does not need garbage collection, and prefers stack allocation by default over heap allocation (an overhead) for any value that's declared in a program. We'll explain all of this in Chapter 5, Memory Management and Safety. The Rust compiler, rustc, was originally written in Ocaml (a functional language) and became a self-hosting one in 2011 after being written in itself.
Self-hosting is when a compiler is built by compiling its own source code. This process is known as bootstrapping a compiler. Compiler its own source code acts as a really good test case for the compiler.
Rust is openly developed on GitHub at https://github.com/rust-lang/rust and continues to evolve at a fast pace. New features are added to the language through a community-driven Request For Comments (RFC) process where anybody can propose new language features. These are then described in detail in an RFC document. A consensus is then sought after for the RFC and if agreed upon, the implementation phase begins for the feature. The implemented feature then gets reviewed by the community, where it is eventually merged to the master branch after undergoing several tests by users in nightly releases. Getting feedback from the community is crucial for the language's evolution. Every six weeks, a new stable version of the compiler is released. Along with fast moving incremental updates, Rust also has this notion of editions, which is proposed to provide a consolidated update to the language. This includes tooling, documentation, its ecosystem, and to phase in any breaking changes. So far, there have been two editions: Rust 2015, which had a focus on stability, and Rust 2018, which is the current edition at the time of writing this book and focuses on productivity.
While being a general purpose multi-paradigm language, it is aiming for systems programming domain where C and C++ have been predominant. This means that you can write operating systems, game engines, and many performance critical applications with it. At the same time, it is also expressive enough that you can build high-performance web applications, network services, type-safe database Object Relational Mapper (ORM) libraries, and can also run on the web by compiling down to WebAssembly. Rust has also gained a fair share of interest in building safety-critical, real-time applications for embedded platforms such as the Arm's Cortex-M based microcontrollers, a domain mostly dominated by C at present. This gamut of applicability in various domains – which Rust exhibits quite well – is something that very rare to find in a single programming language. Moreover, established companies Cloudflare, Dropbox, Chuckfish, npm, and many more are already using it in production for their high-stakes projects.
Rust is characterized as a statically and strongly typed language. The static property means that the compiler has information about all of the variables and their types at compile time and does most of its checks at compile time, leaving very minimal type checking at runtime. Its strong nature means that it does not allow things such as auto-conversion between types, and that a variable pointing to an integer cannot be changed to point to a string later in code. For example, in weakly typed languages such as JavaScript, you can easily do something like two = "2"; two = 2 + two;. JavaScript weakens the type of 2 to be a string at runtime, thus storing 22 as a string in two, something totally contrary to your intent and meaningless. In Rust, the same code, that is, let mut two = "2"; two = 2 + two;, would get caught at compile time, throwing the following error: cannot add `&str` to `{integer}`. This property enables safe refactoring of code and catches most bugs at compile time rather than causing issues at runtime.
Programs written in Rust are very expressive as well as performant, in the sense that you can have most of the features of high-level functional style languages such as higher-order functions and lazy iterators, yet it compiles down to efficient code like a C/C++ program. The defining principles that underline many of its design decisions are compile-time memory safety, fearless concurrency, and zero cost abstractions. Let's elaborate on these ideas.
Compile time memory safety: The Rust compiler can track variables owning a resource in your program at compile time and does all of this without a garbage collector.
Resources can be memory address, a variable holding a value, shared memory reference, file handles, network sockets, or database connection handles.
This means that you can't have infamous problems with pointers use after free, double free, or dangling pointers at runtime. Reference types in Rust (types with & before them) are implicitly associated with a lifetime tag ('foo) and sometimes annotated explicitly by the programmer. Through lifetimes, the compiler can track places in code where a reference is safe to use, reporting an error at compile time if it's illegal. To achieve this, Rust runs a borrow/reference checking algorithm by using these lifetime tags on references to ensure that you can never access a memory address that has been freed. It also does this so that you cannot free any pointer while it is being used by some other variable. We will go into the details of this in Chapter 5, Memory management and Safety.
Zero-cost abstractions: Programming is all about managing complexity, which is facilitated by good abstractions. Let's go through a fine example of abstraction in both Rust and Kotlin (a language targeting Java virtual machines (JVM) that lets us write high-level code and is easy to read and reason about. We'll compare Kotlin's streams and Rust's iterators in manipulating a list of numbers and contrast the zero cost abstraction principle that Rust provides. The abstraction here is to be able to use methods that take other methods as arguments to filter numbers based on a condition without using manual loops. Kotlin is used here for its visual similarity with Rust. The code is fairly simple to understand and we aim to give a high-level explanation. We'll be glossing over the details in code as the whole point of this example is to understand the zero cost property.
First, let's look at the code in Kotlin (the following code can be run online: https://try.kotlinlang.org):
1. import java.util.stream.Collectors
2.
3. fun main(args: Array<String>) {
5. // Create a stream of numbers
6. val numbers = listOf(1, 2, 3, 4, 5, 6, 7, 8, 9, 10).stream()
7. val evens = numbers.filter { it -> it % 2 == 0 }
8. val evenSquares = evens.map { it -> it * it }
9. val result = evenSquares.collect(Collectors.toList())
10. println(result) // prints [4,16,36,64,100]
11.
12. println(evens)
13. println(evenSquares)
14. }
We create a stream of numbers (line 6) and call a chain of methods (filter and map) to transform the elements to collect only squares of even numbers. These methods can take a closure or a function (that is, it -> it * it at line 8) to transform each element in the collection. In functional style languages, when we call these methods on the stream/iterator, for every such call, the language creates an intermediate object to keep any state or metadata in regard to the operation being performed. As a result, evens and evenSquares will be two different intermediate objects that are allocated on the JVM heap. Allocating things on the heap incurs a memory overhead. That's the extra cost of abstraction we have to pay in Kotlin !
When we print the value of evens and evenSquares, we indeed get different objects, as show here:
java.util.stream.ReferencePipeline$Head@51521cc1
java.util.stream.ReferencePipeline$3@1b4fb997
The hex value after the @ is the object's hash code on the JVM. Since the hash codes are different, they are different objects.
In Rust, we do the same thing (the following code can be run online: https://gist.github.com/rust-play/e0572da05d999cfb6eb802d003b33ffa):
1. fn main() {
2. let numbers = vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10].into_iter();
3. let evens = numbers.filter(|x| *x % 2 == 0);
4. let even_squares = evens.clone().map(|x| x * x);
5. let result = even_squares.clone().collect::<Vec<_>>();
6. println!("{:?}", result); // prints [4,16,36,64,100]
7. println!("{:?}\n{:?}", evens, even_squares);
8. }
Glossing over the details, on line 2 we call vec![] to create a list of numbers on the heap, followed by calling into_iter() to make it a iterator/stream of numbers. The into_iter() method creates a wrapper Iterator type, IntoIter([1,2,3,4,5,6,7,8,9,10]), out of a collection (here, Vec <i32> is a list of signed 32 bit integers). This iterator type references the original list of numbers. We then perform filter and map transformations (lines 3 and 4), just like we did in Kotlin. Lines 7 and 8 print the type of evens and even_squares, as follows (some details have been omitted for brevity):
evens:
|
Filter { iter: IntoIter( <numbers> ) } |
even_squares: |
Map { iter: Filter { iter: IntoIter( <numbers> ) }} |
The intermediate objects, Filter and Map, are wrapper types (not allocated on the heap) on the base iterator structure, which itself is a wrapper that holds a reference to the original list of numbers at line 2. The wrapper structures on lines 4 and 5 that get created on calling filter and map, respectively, do not have any pointer indirection in between and impose no heap allocation overhead, as was the case with Kotlin. All of this boils down to efficient assembly code, which would be equivalent to the manually written version using loops.
Fearless concurrency: When we said Rust is concurrent-safe, we meant that the language has Application Programming Interface (API) and abstractions that make it really easy to write correct and safe concurrent code. Contrasting this with C++, the possibility of making mistakes in concurrent code is quite high. When synchronizing data access to multiple threads in C++, you are responsible for calling mutex.lock() every time you enter the critical section, and mutex.unlock() when you exit this section:
// C++
mutex.lock(); // Mutex locked, good to go
// Do super critical stuff
mutex.unlock(); // We're done
Critical section: This is a group of instructions/statements that need to be executed atomically. Here, atomically means no other thread can interrupt the currently executing thread in the critical section, and no intermediate value is perceived by any thread during execution of code in the critical section.
In a large code base with many developers collaborating on the code, you might forget to call mutex.lock() before accessing the shared object from multiple threads, which can lead to data races. Others cases, you might forget to unlock the mutex and starve the other threads that want access to the data.
Rust has a different take on this. Here, you wrap your data in a Mutex type to ensuring synchronized mutable access to data from multiple threads:
// Rust
use std::sync::Mutex;
fn main() {
let value = Mutex::new(23);
*value.lock().unwrap() += 1; // modify
} // unlocks here automatically
In the preceding code, we were able to modify the data after calling lock() on value. Rust uses the notion of protecting the shared data itself and not code. The interaction with Mutex and the protected data is not independent, as is the case with C++. You cannot access the inner data without calling lock on the Mutex type. What about releasing the lock ? Well, calling lock() returns something called MutexGuard, which automatically releases the lock when the variable goes out of scope. It's one of the many safe concurrency abstractions Rust provides. We'll go into detail on them in Chapter 8, Concurrency. Another novel idea is the notion of marker traits, which validate and ensure synchronized and safe access to data in concurrent code at compile time. Traits are described in detail in Chapter 4, Types, Generics, and Traits. Types are annotated with marker traits called Send and Sync to indicate whether they are safe to send to threads or safe to share between threads, respectively. When a program sends a value to a thread, the compiler checks whether the value implements the required marker trait and forbids the usage of the value if it isn't the case. In this way, Rust allows you to write concurrent code without fear, where the compiler catches mistakes in multi-threaded code at compile time. Writing concurrent code is already hard. With C/C++, it gets even harder and more arcane. CPUs aren't getting more clock rates; instead, we have more cores being added. As a result, concurrent programming is the way forward. Rust makes it a breeze to write concurrent code and lowers the bar for many people to get into writing safe, concurrent code.
Rust also employs C++'s RAII idiom for resource initialization. This technique basically ties a resource's lifetime to objects' lifetimes, whereas the deallocation of heap allocated types is performed through the drop method, which is provided by the drop trait. This is automatically called when the variable goes out of scope. It also replaces the concept of null pointers with Result and Option types, which we'll go into detail in Chapter 6, Error Handling. This means that Rust doesn't allow null/undefined values in code, except when interacting with other languages through foreign function interfaces and when using unsafe code. The language also puts emphasis on composition over inheritance and has a trait system, which is implemented by data types and is similar to Haskell typeclasses, also known as Java interfaces on steroids. Traits in Rust are the backbone to many of its features, as we'll see in upcoming chapters.
Last but not least, Rust's community is quite active and friendly, and the language has comprehensive documentation, which can be found at https://doc.rust-lang.org. For the third year in a row (2016, 2017, and 2018), Stack Overflow's Developer Survey highlights Rust as the most-loved programming language, so it can be said that the overall programming community is very interested in it. To summarize, you should care about Rust if you aim to write high performing software with less bugs while enjoying many modern language features and an awesome community!