You're reading from Speed Up Your Python with Rust Optimize Python performance by creating Python pip modules in Rust with PyO3

Product type Paperback

Published in Jan 2022

Publisher Packt

ISBN-13 9781801811446

Length 384 pages

Edition 1st Edition

Languages

Python

Tools

Docker

Concepts

Software Architecture

Author (1):

Maxwell Flitton

View More author details

Table of Contents (16) Chapters

Preface

1. Section 1: Getting to Understand Rust

2. Chapter 1: An Introduction to Rust from a Python Perspective FREE CHAPTER

3. Chapter 2: Structuring Code in Rust

4. Chapter 3: Understanding Concurrency

5. Section 2: Fusing Rust with Python

6. Chapter 4: Building pip Modules in Python

7. Chapter 5: Creating a Rust Interface for Our pip Module

8. Chapter 6: Working with Python Objects in Rust

9. Chapter 7: Using Python Modules with Rust

10. Chapter 8: Structuring an End-to-End Python Package in Rust

11. Section 3: Infusing Rust into a Web Application

12. Chapter 9: Structuring a Python Flask App for Rust

13. Chapter 10: Injecting Rust into a Python Flask App

14. Chapter 11: Best Practices for Integrating Rust

15. Other Books You May Enjoy

Understanding the differences between Python and Rust

Rust can sometimes be described as a systems language. As a result, it can sometimes be labeled by software engineers in a way that is similar to C++: fast, hard to learn, dangerous, and time-consuming to code in. As a result, most of you mainly working in dynamic languages such as Python could be put off. However, Rust is memory-safe, efficient, and productive. Once we have gotten over some of the quirks that Rust introduces, nothing is holding you back from exploiting Rust's advantages by using it to write fast, safe, and efficient code. Seeing as there are so many advantages to Rust, we will explore them in the next section.

Why fuse Python with Rust?

When it comes to picking a language, there is usually a trade-off between resources, speed, and development time. Dynamic languages such as Python became popular as computing power increased. We were able to use the extra resources we had to manage our memory with garbage collectors. As a result, developing software became easier, quicker, and safer. As we will cover later in the Keeping track of scopes and lifetimes section, poor memory management can lead to some security flaws. The exponential increase in computing power over the years is known as Moore's Law. However, this is not continuing to hold and in 2019, Nvidia's CEO Jensen Huang suggested that as chip components get closer to the size of individual atoms, it has gotten harder to keep up with the pace of Moore's Law, thus declaring it dead (https://www.cnet.com/news/moores-law-is-dead-nvidias-ceo-jensen-huang-says-at-ces-2019/).

However, with the rise of big data, our need to pick up faster languages to satisfy our needs is increasing. This is where languages such as Golang and Rust enter. These languages are memory-safe, yet they compile and have significant speed increases. What makes Rust even more unique is that it has managed to achieve memory safety without garbage collection. To appreciate this, let's briefly describe garbage collection: this is where the program temporarily stops, checks all the variables to see which ones are no longer being used, and deletes those that are not. Considering that Rust does not have to do this, it is a significant advantage as Rust does not have to keep stopping to clean up the variables. This was demonstrated in Discord's 2020 blog post Why Discord is switching from Go to Rust: https://blog.discord.com/why-discord-is-switching-from-go-to-rust-a190bbca2b1f#:~:text=The%20service%20we%20switched%20from,is%20in%20the%20hot%20path. In this post, we can see that Golang just could not keep up with Rust, as demonstrated in the graph they presented:

Figure 1.1 – Golang is spiky and Rust is the flat line below Golang (image source: https://blog.discord.com/why-discord-is-switching-from-go-to-rust-a190bbca2b1f#:~:text=The%20service%20we%20switched%20from,is%20in%20the%20hot%20path)

The comments on the post were full of people complaining that Discord used an out-of-date version of Golang. Discord responded to this by stating that they tried a range of Golang versions, and they all had similar results. With this, it makes sense to get the best of both worlds without much compromise. We can use Python for prototyping and complex logic. The extensive range of third-party libraries that Python has combined with the flexible object-oriented programming it supports make it an ideal language for solving real-world problems. However, it's slow and is not efficient with the use of resources. This is where we reach for Rust.

Rust is a bit more restrictive in the way we can lay out and structure the code; however, it's fast, safe, and efficient when implementing multithreading. Combining these two languages enables a Python developer to have a powerful tool in their belt that their Python code can use when needed. The time investment needed to learn and fuse Rust is low. All we must do is package Rust and install it in our Python system using pip and understand a few quirks that Rust has that are different from Python. We can start this journey by looking at how Rust handles strings in the next section. However, before we explore strings, we have to first understand how Rust is run compared to Python.

If you have built a web app in Python using Flask, you will have seen multiple tutorials sporting the following code:

from flask import Flask
app = Flask(__name__)
@app.route("/")
def home():
    return "Hello, World!"
    
if __name__ == "__main__":
    app.run(debug=True)

What we must note here is the last two lines of the code. Everything above that defines a basic Flask web app and a route. However, the running of the app in the last two lines only executes if the Python interpreter is directly running the file. This means that other Python files can import the Flask app from this file without running it. This is referred to by many as an entry point.

You import everything you need in this file, and for the application to run, we get our interpreter to run this script. We can nest any code under the if __name__ == "__main__": line of code. It will not run unless the file is directly hit by the Python interpreter. Rust has a similar concept. However, this is more essential, as opposed to Python that just has it as a nice-to-have feature. In the Rust playground (see the Technical requirements section), we can type in the following code if it is not there already:

fn main() {
    println!("hello world");
}

This is the entry point. The Rust program gets compiled, and then runs the main function. If whatever you've coded is not accessed by the main function, it will never run. Here, we are already getting a sense of the safety enforced by Rust. We will see more of this throughout the book.

Now that we have our program running, we can move on to understanding the difference between Rust and Python when it comes to strings.

Passing strings in Rust

In Python, strings are flexible. We can pretty much do what we want with them. While technically, Python strings cannot be changed under the hood, in Python syntax, we can chop and change them, pass them anywhere, and convert them into integers or floats (if permitted) without having to think too much about it. We can do all of this with Rust too. However, we must plan beforehand what we are going to do. To demonstrate this, we can dive right in by making our own print function and calling it, as seen in the following code:

fn print(input: str) {
    println!("{}", input);
}
fn main() {
    print("hello world");
}

In Python, a similar program would work. However, when we run it in the Rust playground, we get the following error:

error[E0277]: the size for values of type 'str' cannot be known at compilation time

This is because we cannot specify what the maximum size is. We don't get this in Python; therefore, we must take a step back and understand how variables are assigned in memory. When the code compiles, it allocates memory for different variables in the stack. When the code runs, it stores data in the heap. Strings can be various sizes so we cannot be sure at compile time how much memory we can allocate to the input parameter of our function when compiling. What we are passing in is a string slice. We can remedy this by passing in a string and converting our string literal to a string before passing it into our function as seen here:

fn print(input: String) {
    println!("{}", input);
}
fn main() {
    let string_literal = "hello world";
    print(string_literal.to_string());
}

Here, we can see that we have used the to_string() function to convert our string literal into a string. To understand why String is accepted, we need to understand what a string is.

A string is a type of wrapper implemented as a vector of bytes. This vector holds a reference to a string slice in the heap memory. It then holds the amount of data available to the pointer, and the length of the string literal. For instance, if we have a string of the string literal one, it can be denoted by the following diagram:

Figure 1.2 – String relationship to str

Considering this, we can understand why we can guarantee the size of String when we pass it into our function. It will always be a pointer to the string literal with some meta-information about the string literal. If we can just make a reference to the string literal, we can pass this into our function as it is just a reference and we can therefore guarantee that the size of the reference will stay the same. This can be done by borrowing using the & operator as shown in the following code:

fn print(input_string: &str) {
    println!("{}", input_string);
}
fn main() {
    let test_string = &"Hello, World!";
    print(test_string);
}

We will cover the concept of borrowing later in the chapter but, for now, we understand that, unlike Python, we must guarantee the size of the variable being passed into a function. We can use borrowing and wrappers such as strings to handle this. It may not come as a surprise, but this does not just stop at strings. Considering this, we can move on to the next section to understand the differences between Python and Rust when it comes to floats and integers.

Sizing up floats and integers in Rust

Like strings, Python manages floats and integers with ease and simplicity. We can pretty much do whatever we want with them. For instance, the following Python code will result in 6.5:

result = 1 + 2.2
result = result + 3.3

However, there is a problem when we try to just execute the first line in Rust with the following line of Rust code:

let result = 1 + 2.2;

It results in an error telling us that a float cannot be added to an integer. This error highlights one of the pain points that Python developers go through when learning Rust, as Rust enforces typing aggressively by refusing to compile if typing is not present and consistent. However, while this is an initial pain, aggressive typing does help in the long run as it maintains safety.

Type annotation in Python is gaining popularity. This is where the type of the variable is declared for parameters of functions or variables declared, enabling some editors to highlight when the types are inconsistent. The same happens in JavaScript with TypeScript. We can replicate the Python code at the start of this section with the following Rust code:

let mut result = 1.0 + 2.2;
result = result + 3.3;

It has to be noted that the result variable must be declared as a mutable variable with the mut notation. Mutable means that the variable can be changed. This is because Rust automatically assigns all variables as immutable unless we use the mut notation.

Now that we have seen the effects of types and mutability, we should really explore integers and floats. Rust has two types of integers: signed integers, which are denoted by i, and unsigned integers, denoted by u. Unsigned integers only house positive numbers, whereas signed integers house positive and negative integers. This does not just stop here. In Rust, we can also denote the size of the integer that is allowed. This can be calculated by using binary. Now, understanding how to use binary notation to describe numbers in detail is not really needed. However, understanding the simple rule that the size can be calculated by raising two to the power of the number of bits can give us an understanding of how big an integer is allowed to be. We can calculate all the integer sizes that we can utilize in Rust with the following table:

Table 1.1 – Size of integer types

As we can see, we can get to very high numbers here. However, it is not the best idea to assign all variables and parameters as u128 integers. This is because the compiler will set aside this amount of memory each time when compiling. This is not very efficient considering that it's unlikely that we will be using such large numbers. It must be noted that the changes in each jump are so large it is pointless graphing it. Each jump in bits completely overshadows all the others, resulting in a flat line along the x axis and a huge spike at the last graphed number of bits. However, we also must be sure that our assignment is not too small. We can demonstrate this with the Rust code as follows:

let number: u8 = 255;
let breaking_number: u8 = 256;

Our compiler will be OK with the number variable. However, it will throw the error shown next when assigning the breaking_number variable:

literal '256' does not fit into the type 'u8' whose range 
is '0..=255'

This is because there are 256 integers between 0 -> 255, as we include 0. We can change our unsigned integer to a signed one with the following line of Rust code:

let number: i8 = 255;

This gives us the following error:

literal '255' does not fit into the type 'i8' whose range 
is '-128..=127'

In this error, we are reminded that the bits are are allocated memory space. Therefore, an i8 integer must accommodate positive and negative integers within the same number of bits. As a result, we can only support a magnitude that is half of the integer of an unsigned integer.

When it comes to floats, our choices are more limited. Here, Rust accommodates both f32 and f64 floating points. Declaring these floating-point variables requires the same syntax as integers:

let float: f32 = 20.6;

It must be noted that we can also annotate numbers with suffixes, as shown in the following code:

let x = 1u8;

Here, x has a value of 1 with the type of u8. Now that we have covered floats and integers, we can use vectors and arrays to store them.

Managing data in Rust's vectors and arrays

With Python, we have lists. We can stuff anything we want into these lists with the append function and these lists are, by default, mutable. Python tuples are technically not lists, but we can treat them as immutable arrays. With Rust, we have arrays and vectors. Arrays are the most basic of the two. Defining and looping through an array is straightforward in Rust, as we can see in the following code:

let array: [i32; 3] = [1, 2, 3];
println!("array has {} elements", array.len());
for i in array.iter() {
    println!("{}", i);
}

If we try and append another integer onto our array with the push function, we will not be able to even if the array is mutable. If we add a fourth element to our array definition that is not an integer, the program will refuse to compile as all of the elements in the array have to be the same. However, this is not entirely true.

Later in this chapter, we will cover structs. In Python, the closest comparison to objects is structs as they have their own attributes and functions. Structs can also have traits, which we will also discuss later. In terms of Python, the closest comparison to traits is mixins. Therefore, a range of structs can be housed in an array if they all have the same trait in common. When looping through the array, the compiler will only allow us to execute functions from that trait as this is all we can ensure will be consistent throughout the array.

The same rules in terms of type or trait consistency also apply to vectors. However, vectors place their memory on the heap and are expandable. Like everything in Rust, they are, by default, immutable. However, applying the mut tag will enable us to add and manipulate the vector. In the following code, we define a vector, print the length of the vector, append another element to the vector, and then loop through the vector printing all elements:

let mut str_vector: Vec<&str> = vec!["one", "two", \
  "three"];
println!("{}", str_vector.len());
str_vector.push("four");
for i in str_vector.iter() {
    println!("{}", i);
}

This gives us the following output:

3
one
two
three
four

We can see that our append worked.

Considering the rules about consistency, vectors and arrays might seem a little restrictive to a Python developer. However, if they are, sit back and ask yourself why. Why would you want to put in a range of elements that do not have any consistency? Although Python allows you to do this, how could you loop through a list with inconsistent elements and confidently perform operations on them without crashing the program?

With this in mind, we are starting to see the benefits and safety behind this restrictive typing system. There are some ways in which we can put in different elements that are not structs bound by the same trait. Considering this, we will explore how we can store and access our varied data elements via hashmaps in Rust in the next section.

Replacing dictionaries with hashmaps

Hashmaps in Rust are essentially dictionaries in Python. However, unlike our previous vectors and arrays, we want to have a range of different data types housed in a hashmap (although we can also do this with vectors and arrays). To achieve this, we can use Enums. Enums are, well, Enums, and we have the exact same concept in Python. However, instead of it being an Enum, we merely have a Python object that inherits the Enum object as seen in the following code:

from enum import Enum 
class Animal(Enum):
    STRING = "string"
    INT = "int"

Here, we can use the Enum to save us from using raw strings in our Python code when picking a particular category. With a code editor known as an IDE, this is very useful, but it's understandable if a Python developer has never used them as they are not enforced anywhere. Not using them makes the code more prone to mistakes and harder to maintain when categories change and so on, but there is nothing in Python stopping the developer from just using a raw string to describe an option. In Rust, we are going to want our hashmap to accept strings and integers. To do this, we are going to have to carry out the following steps:

Create an Enum to handle multiple data types.
Create a new hashmap and insert values belonging to the Enum we created in step 1.
Test the data consistency by looping through the hashmap and match all possible outcomes.
Build a function that processes data extracted from the hashmap.
Use the function to process outcomes from getting a value from the hashmap.

Therefore, we are going to create an Enum that houses this using the following code:

enum Value {
    Str(&'static str),
    Int(i32),
}

Here, we can see that we have introduced the statement 'static. This denotes a lifetime and basically states that the reference remains for the rest of the program's lifetime. We will cover lifetimes in the Keeping track of scopes and lifetimes section.

Now that we have defined our Enum, we can build our own mutable hashmap and insert an integer and a string into it with the following code:

use std::collections::HashMap;
let mut map = HashMap::new();
map.insert("one", Value::Str("1"));
map.insert("two", Value::Int(2));

Now that our hashmap is housing a single type that houses the two types we defined, we must handle them.

Remember, Rust has strong typing. Unlike Python, Rust will not allow us to compile unsafe code (Rust can compile in an unsafe context but this is not default behavior). We must handle every possible outcome, otherwise the compiler will refuse to compile. We can do this with a match statement as seen in the following code:

for (_key, value) in &map {
    match value {
        Value::Str(inside_value) => {
            println!("the following value is an str: {}", \ 
                inside_value);
        }
        Value::Int(inside_value) => {
            println!("the following value is an int: {}", \ 
                inside_value);
        }
    }
}

In this code sample, we have looped through a borrowed reference to the hashmap using &. Again, we will cover borrowing later on in the Understanding variable ownership section. We prefix the key with a _. This is telling the compiler that we are not going to use the key. We don't have to do this as the compiler will still compile the code; however, it will complain by issuing a warning. The value that we are retrieving from the hashmap is our Value Enum. In this match statement, we can match the field of our Enum, and unwrap and access the inside value that we denote as inside_value, printing it to the console.

Running the code gives us the printout to the terminal as follows:

the following value is an int: 2
the following value is an str: 1

It must be noted that Rust is not going to let anything slip by the compiler. If we remove the match for our Int field for our Enum, then the compiler will throw the error seen here:

18 |           match value {
   |           ^^^^^ pattern '&Int(_)' not covered
   |
   = help: ensure that all possible cases are being 
     handled, 
   possibly by adding wildcards or more match arms
   = note: the matched value is of type '&Value'

This is because we have to handle every single possible outcome. Because we have been explicit that only values that can be housed in our Enum can be inserted into the hashmap, we know that there are only two possible types that can be extracted from our hashmap. We have nearly covered enough about hashmaps to use them effectively in Rust programs. One last concept that we must cover is the Enum called Option.

Considering that we have arrays and vectors, we will not be using our hashmaps primarily for looping through outcomes. Instead, we will be retrieving values from them when we need them. Like in Python, the hashmap has a get function. In Python, if the key that is being searched is not in the dictionary, then the get function will return None. It is then left to the developer to decide what to do with it. However, in Rust, the hashmap will return a Some or None. To demonstrate this, let's try to get a value belonging to a key that we know is not there:

Start by running the following code:
```
let outcome: Option<&Value> = map.get("test");
println!("outcome passed");
let another_outcome: &Value = \
    map.get("test").unwrap();
println!("another_outcome passed");
```
Here, we can see that we can access the reference to the Value Enum wrapped in Option with the get function. We then directly access the reference to the Value Enum using the unwrap function.
However, we know that the test key is not in the hashmap. Because of this, the unwrap function will cause the program to crash, as seen in the following output from the previous code:
```
thread 'main' panicked at 'called 'Option::unwrap()' 
on a 'None' value', src/main.rs:32:51
```
We can see that the simple get function did not crash the program. However, we didn't manage to get the string "another_outcome passed" to print out to the console. We can handle this with a match statement.
However, this is going to be a match statement within a match statement.

In order to reduce the complexity, we should explore Rust functions to process our value Enum. This can be done with the following code:

fn process_enum(value: &Value) -> () {
    match value {
        Value::Str(inside_value) => {
            println!("the following value is an str: \
              {}", inside_value);
        }
        Value::Int(inside_value) => {
            println!("the following value is an int: \
              {}", inside_value);
        }
    }
}

The function does not really give us any new logic to explore. The -> () expression is merely stating that the function is not returning anything.

If we are going to return a string, for instance, the expression would be -> String. We do not need the -> () expression; however, it can be helpful for developers to quickly understand what's going on with the function. We can then use this function to process the outcome from our get function with the following code:
```
match map.get("test") {
    Some(inside_value) => {
        process_enum(inside_value);
    }
    None => {
        println!("there is no value");
    }
}
```

We now know enough to utilize hashmaps in our programs. However, we must notice that we have not really handled errors; we have either printed out that nothing was found or let the unwrap function just result in an error. Considering this, we will move on to the next section on handling errors in Rust.

Error handling in Rust

Handling errors in Python is straightforward. We have a try block that houses an except block underneath. In Rust, we have a Result wrapper. This works in the same way as an Option. However, instead of having Some or None, we have Ok or Err.

To demonstrate this, we can build on the hashmap that was defined in the previous section. We accept Option from a get function applied to the hashmap. Our function will check to see whether the integer retrieved from the hashmap is above a threshold. If it's above the threshold, we will return a true value. If not, then it is false.

The problem is that there might not be a value in Option. We also know that the Value Enum might not be an integer. If any of this is the case, we should return an error. If not, we return a Boolean. This function can be seen here:

fn check_int_above_threshold(threshold: i32, 
    get_result: Option<&Value>) -> Result<bool, &'static \
      str> {
    match get_result {
      Some(inside_value) => {
        match inside_value {
          Value::Str(_) => return Err(
            "str value was supplied as opposed to \
              an int which is needed"),
                Value::Int(int_value) => {
                    if int_value > &threshold {
                        return Ok(true)
                    }
                    return Ok(false)
                } 
            }
        }
        None => return Err("no value was supplied to be \
          checked")
    }
}

Here, we can see that the None result from Option instantly returns an error with a helpful message as to why we are returning an error. With the Some value, we utilize another match statement to return an error with a helpful message that we cannot supply a string to check the threshold if the Value is a string. It must be noted that Value::Str(_) has a _ in it. This means that we do not care what the value is because we are not going to use it. In the final part, we check to see whether the integer is above the threshold returning Ok values that are either true or false. We implement this function with the following code:

let result: Option<&Value> = map.get("two");
let above_threshold: bool = check_int_above_threshold(1, \
    result).unwrap();
println!("it is {} that the threshold is breached", \
    above_threshold);

This gives us the following output in the terminal:

it is true that the threshold is breached

If we up the first parameter in our check_int_above_threshold function to 3, we get the following output:

it is false that the threshold is breached

If we change the key in map.get to three, we get the following terminal output:

thread 'main' panicked at 'called 'Result::unwrap()' 
on an 'Err' value: "no value was supplied to be checked"'

If we change the key in map.get to one, we get the following terminal output:

thread 'main' panicked at 'called 'Result::unwrap()' on 
an 'Err' value: "str value was supplied as opposed to an 
int

We can add extra signposting to the unwrap with the expect function. This function unwraps the result and adds an extra message to the printout if there is an error. With the following implementation, the message "an error happened" will be added to the error message:

let second_result: Option<&Value> = map.get("one");
let second_threshold: bool = check_int_above_threshold(1, \
    second_result).expect("an error happened");

We can also directly throw an error if needed with the following code:

panic!("throwing some error");

We can also check to see whether the result is an error by using the is_err function as seen here:

result.is_err()

This returns a bool, enabling us to alter the direction of our program if we come across an error. As we can see, Rust gives us a range of ways in which we can throw and manage errors.

We can now handle enough of Rust's quirks to write basic scripts. However, if the program gets a little more complicated, we fall into other pitfalls such as variable ownership and lifetimes. In the next section, we cover the basics of variable ownership so we can continue to use our variables throughout a range of functions and structs.