Asking the Rust compiler about performance

Rust sometimes has interesting and lesser-known features that really make a difference when talking about performance enhancements. When it comes to big improvements with small changes, the first thing that you should understand is the release mode. Rust by default compiles your software in development mode, which is pretty good to check for compiling errors quickly, but if you want to run it, it will run slowly. This is due to the development mode not performing any optimizations. It will create object (machine) code directly related to the Rust code without optimizing it.

Rust makes use of the LLVM backend, which makes it easy to take advantage of its performance optimizations without having to develop all of these by themselves. They will only need to use LLVM intermediate representation. An intermediate language between Rust and assembly code that the LLVM compiler understands. While in development mode, no optimizations get performed by Rust or LLVM; enabling them is as easy as adding the --release flag to the cargo compilation. So, for example, if you were running your software by typing cargo run in the console, just by using cargo run --release it will compile with optimizations and run much, much faster. Usually, the gain is of more than one order of magnitude.

Optimizations

By default, Rust will perform level 3 optimizations in the code. Optimizations get divided into levels depending on how complex they are. Higher-level optimizations, in theory, improve the performance of the code greatly, but they might have bugs that could change the behavior of the program. Usually, level 1 optimizations are totally safe, and level 2 optimizations are the most-used ones in the C/C++ ecosystem. Level 3 optimizations have not been known to cause any issues, but in some critical situations, it might be better to avoid them. This can be configured, but we should first understand how the Rust compiler compiles the code to machine instructions so that we know what different options accomplish.

Rust first starts with parsing your code files. It will get the keywords and the different symbols to create a representation of the code in memory. This parsing will find common errors such as a missing semicolon or an invalid keyword. This memory representation of the code is called High Intermediate Representation (HIR). This representation of the code will be greatly simplified, removing complex flow structures and converting it into Middle Intermediate Representation (MIR).

The MIR representation is then used to check more complex flows of the software, and enables complex variable lifetime checks, along with some other improvements. This is then converted to the LLVM Intermediate Representation and gets passed to the LLVM compiler. When passing this code to LLVM, Rust adds some flags that will modify the way that LLVM optimizes the code. We have already seen that by default one of the flags it passes is the -O0 flag, or do not optimize flag, so it simply translates to machine code. When compiling in release mode, though, a -O3 gets passed so that level 3 optimizations get performed.

This behavior can be configured in the Cargo.toml file of the project and it can be configured for each profile. You can configure how to compile for tests, development, documentation, benchmarks, and release. You will probably want to keep development and documentation optimizations to a minimum, as in those profiles the main idea is to compile quickly. In the case of the development profile, you will want to check if everything compiles properly, and even test the behavior of the program a little bit, but you probably won't be concerned about the performance. When generating the documentation, the performance of the application doesn't matter at all, so the best idea is to just not optimized.

When testing, the optimization level you need will depend on how many tests you want to run and how computationally expensive they are. If it takes a really long time to run the tests, it may make sense to compile them optimized. Also, in some critical situations in which you might not be 100% sure that optimizations get performed in a completely secure way, you might want to optimize the tests the same way you optimize the release, and that way you can check if all unit and integration tests pass properly even after optimizations. If they don't, this is a compiler malfunction, and you should report it to the Rust compiler team. They will be glad to help.

Of course, benchmarks and release profiles should be the most optimized ones. In benchmarks, you will want to know the real optimized performance of the code, while in the release, you will want your users to get the best out of their hardware and your software to make things run as efficiently as possible. In these cases, you will want to optimize up to level 2 at least, and if you are not sending satellites to space or programming a pacemaker, you will probably want to optimize all the way up to level 3.

Build configuration

There is one section in the Cargo.toml file that enables these configurations: the profile section. In this section, you will find one subsection for each of the profiles. Each of them gets declared with the [profile.{profile}] format. So, for example, for the development profile, it would be [profile.dev]. The different profile configuration keywords are the following:

dev for the development profile, used in cargo build or cargo run
release for the release profile, used in cargo build --release or cargo run --release
test for the testing profile, used in cargo test
bench for the benchmarking profile, used in cargo bench
doc for the documentation profile, used in cargo doc

When configuring each profile, you will have many options, and we will check all of them out here.

Optimization level

The first option is the one mentioned before, the optimization level. This configuration option can be set by using the opt-level key in the relevant profile section. By default, optimizations will be level 3 for benchmarking and release, and zero for the rest. For example, to only perform level 2 optimizations in the release profile, you can add this code to your Cargo.toml file:

    [profile.release]
    opt-level = 2

Debug information

The next option is the debug information. This does not directly affect performance, but it's an interesting configuration item. In this case, you can decide if the debug symbol information gets added to the final executable. This is really useful if you are developing, and especially if you are using a debugger such as GDB. Adding debug information to the executable will enable you to get the function names and even the line numbers of each instruction being executed in the processor. This will give you great insight about what is happening in the code.

In any case, debug information is not so useful in final release binaries, as final release binaries are not meant to be used in debugging. And the debug information usually adds a lot of size to the final binary. This has many times been a concern among developers, as the size of Rust binaries is usually much bigger than the ones written in C/C++. This is in part due to this configuration, and in most cases due to the panic behavior, which we will check later. Debug symbols will also show information about the original code, so it might make sense to hide it in closed-source projects.

To avoid extra debug symbols in the final binary, the debug option must be set to false. This can be done for each profile, and by default, it's true only for the development profile. If you'd like to enable it for testing also, for example, you can add this in the Cargo.toml file:

    [profile.test]
    debug = true

You can, of course, combine this with any other profile option:

    [profile.test]
    debug = true
    opt-level = 1

Link-time optimizations

The next configuration option, useful for improving the performance of the application, is link-time optimizations. Usually, when a program gets built, once all the code has been optimized, it gets linked to other libraries and functions that provide the required functionality. This, however, does not always happen in the most efficient way. Sometimes, a function gets linked twice, or a piece of code gets used in many places, and in that case, the compiler might decide to duplicate some code.

The program will work perfectly, but this has two main disadvantages—first of all, duplicating code and links will make the binary bigger, which is probably something you don't want, and secondly, it will reduce the performance. You might ask why. Well, since it's the same code being accessed from different places in the program, it might make sense that if it gets executed once, it gets added to the L1/L2/L3 caches in the processor. This will enable the future reuse of these instructions without requiring the processor to get them from the RAM memory (much slower) or even the disk/SSD (extremely slow) if the memory has been swapped.

The main advantage when performing Link-Time Optimizations, or LTOs in short, is that while Rust compiles the code file by file, LTOs fit the whole, almost final, representation into a big compilation unit that can be optimized in its entirety, enabling better execution paths.

This can be performed, of course, but at a really high compilation time cost. These optimizations are very costly because they sometimes require changing the final representation, the one ready to be written to the binary. Not only that, but this requires checking lots of execution paths and code samples to find similar blocks. And remember, this is done with the object code, not the Rust code, so the compiler doesn't know about libraries or modules; it only sees instructions.

This costly optimization will improve the performance of your software and the size of your binaries, but being so costly (usually taking as much time as the rest of the compilation, or even more), it's not enabled by default in any of the profiles. And you should not enable it on any but release and maybe benchmarking profiles (you don't want to wait for an LTO every time you make a small change in a function and want to test it). The configuration item to change is the lto configuration item:

    [profile.release]
    lto = true

There is a related configuration item, which will be ignored if LTO is turned on for the given profile. I'm talking about the codegen units. This divides the code into multiple smaller code units and compiles each of them separately, enabling parallel compiling, which improves how fast the program compiles. This is one in the case of LTO, but can be modified for the rest. Of course, using separate compilation units avoids some optimizations that can improve the performance of the code, so it might make sense to enable faster compilation in development mode. It will be 16 by default in development mode.

This is as simple as changing the codegen-units configuration option in the development profile:

    [profile.dev]
    codegen-units = 32

An example value could be the number of processors/threads in your computer. But remember that this will make the compiled software slower, so do not use it in the release profile. It will, in any case, be ignored if you activate the link-time optimizations.

Debug assertions

The next interesting configuration item is the one that allows debug assertions to be removed. Debug assertions are like normal assertions, but by default they are only executed in the development profile. They are written in the code by prefixing the assert! macros with debug_, using for example debug_assert! or debug_assert_eq!. This enables you to fill the whole code with assertions that must be true and that take processing cycles to test, while not reducing the performance of a release application. Of course, this means that those assertions won't run in release mode. This is useful for testing internal methods, but is probably not the best for APIs, and certainly not a good idea in unsafe code wrappers.

For example, the indexing function in the standard library Vec object has an assertion that will check each time you get an element of the vector by index if the index is out of bounds. This is great to avoid buffer overflows, but makes the operation of getting an element of the vector slower, and if the index is out of bounds, the program will panic. We will talk about this particular example later, but in general, it shows how useful these assertions are—in this case for release mode also.

On the other hand, if you plan to create a small internal API that will input numbers between 0 and 100 and do some calculations with them, but is not exposed to the public, you could simply add a debug_assert!(num <= 100 && num >= 0) and, in tests and debug mode, it will panic the program if a number outside that range is received by the function, but it will not run the assertion in release mode. This can be a potential error vector, but with thorough unit testing, the odds of not getting the error in testing/development mode and an incorrect number being received in release mode are much, much lower. Of course, once again, this shouldn't be used for security-focused areas or input that would cause unsafe or undefined behavior.

By default, as explained, these assertions run in development, testing, and documentation modes. This last one is useful if you have documentation tests with debug assertions. This can be configured, in any case, easily, by changing the debug-assertions configuration option. For example:

    [profile.doc]
    debug-assertions = false

Panic behavior

The next configuration variable to check is the panic behavior. By default, Rust will unwind in a panic. This means that it will call each destructor of each variable in the stack if something goes terribly wrong and the application panics. There is another option: not calling anything and simply aborting the program (the standard C/C++ behavior).

The main advantage of the unwind is that you will be able to call the destructors, so any cleanup that should be done for your variables in the stack of the program will be done properly. The main advantage of the abort behavior is that it will require much less code to be compiled, since for each potential panic location a new branch gets added to the code where all the destructors are run. It also gives the code much fewer branches, which makes it easier to optimize, but the main advantage is smaller binaries. Of course, you lose the ability to run the destructors, so some complex behavior might not be properly cleaned, for example, if you need to write something to a log upon shutdown.

If you still think that in your use case, using the abort behavior is a good idea, you can enable it by using the panic keyword:

    [profile.doc]
    panic = 'abort'

Runtime library paths

The last configuration option is rpath. This configuration item accepts a Boolean and allows you to ask the Rust compiler to set loader paths when the executable looks OK for libraries at runtime. Even though, most of the time, Rust will link crates and libraries statically, you can ask a specific library to be linked dynamically. In this case, that library will be searched at runtime, not at compile time, and it will therefore use system libraries installed where the program is running.

This configuration option asks cargo to add -C rpath to the rustc compiler invocation. This will add paths to the dynamic library search paths. Nevertheless, this should not be required in most cases, and you should avoid it if it's not necessary by using false as the option value. If you are having issues making your application run in multiple operating systems, you might try it, since it might make the executable look for dynamic libraries in new places.

You're reading from Rust High Performance Learn to skyrocket the performance of your Rust applications

Table of Contents (13) Chapters