Controlling variable ownership
As Rust does not have a garbage collector, it maintains memory safety by enforcing strict rules around variable ownership that are enforced when compiling. These rules can initially bite developers from dynamic languages and lead to frustration, giving Rust its false steep learning curve reputation. However, if these rules are understood early, the helpful compiler makes it straightforward to adhere to them. Rust's compile-time checking is done to protect against the following memory errors:
- Use after frees: This is where memory is accessed once it has been freed, which can cause crashes. It can also allow hackers to execute code via this memory address.
- Dangling pointers: This is where a reference points to a memory address that no longer houses the data that the pointer was referencing. Essentially, this pointer now points to null or random data.
- Double frees: This is where allocated memory is freed, and then freed again. This can cause the program to crash and increases the risk of sensitive data being revealed. This also enables a hacker to execute arbitrary code.
- Segmentation faults: This is where the program tries to access the memory it's not allowed to access.
- Buffer overrun: An example of this is reading off the end of an array. This can cause the program to crash.
Protection is achieved by Rust following ownership rules. These ownership rules flag code that can lead to the memory errors we just mentioned (given as follows). If they are broken, they are flagged up as compile-time errors. These are defined here:
- Values are owned by the variables assigned to them.
- As soon as the variable goes out of scope, it is deallocated from the memory it is occupying.
- Values can be used by other variables, as long as we adhere to the following rules:
- Copy: This is where the value is copied. Once it has been copied, the new variable owns the value, and the existing variable also owns its own value.
- Move: This is where the value is moved from one variable to another. However, unlike clone, the original variable no longer owns the value.
- Immutable borrow: This is where another variable can reference the value of another variable. If the variable that is borrowing the value falls out of scope, the value is not deallocated from memory as the variable borrowing the value does not have ownership.
- Mutable borrow: This is where another variable can reference and write the value of another variable. If the variable that is borrowing the value falls out of scope, the value is not deallocated from memory as the variable borrowing the value does not have ownership.
Considering that scopes play a big role in the ownership rules, we'll explore them in more detail in the next section.
Scopes
The key rule to remember when it comes to ownership in Rust is that when let
is used to create a variable, that variable is the only one that owns the resource. Therefore, if the resource is moved or reassigned, then the initial variable no longer owns the resource.
Once the scope has ended, then the variable and the resource are deleted. A good way to demonstrate this is through scopes. Scopes in Rust are defined by curly brackets. The classic way of demonstrating this is through the following example:
fn main() { let one: String = String::from("one"); { println!("{}", one); let two: String = String::from("two"); } println!("{}", one); println!("{}", two); }
Commenting out the last print
statement will enable the code to run. Keeping it will cause the code to crash due to the fact that two
is created inside a different scope and then deleted when the inner scope ends. We can also see that one
is available in the outer scope and the inside scope. However, it gets interesting when we pass the variable into another function:
fn print_number(number: String) { println!("{}", number); } fn main() { let one: String = String::from("one"); print_number(one); println!("{}", one); }
The error from the preceding code tells us a lot about what's going on:
6 | let one: String = String::from("one"); | --- move occurs because `one` has type `std::string::String`, which does not implement the `Copy` trait 7 | print_number(one); | --- value moved here 8 | println!("{}", one); | ^^^ value borrowed here after move
The stem of the error has occurred because String
does not implement a copy trait. This is not surprising as we know that String
is a type of wrapper implemented as a vector of bytes. This vector holds a reference to str
, the capacity of str
in the heap memory, and the length of str
, as denoted in the following diagram:
Having multiple references to the value breaks our rules. Passing one
through our print
function moves it into another scope, which is then destroyed. If we passed ownership to a function but still allowed references outside the function later on, these references will be pointing to freed memory, which is unsafe.
The compiler is very helpful in telling us that the variable has been moved, which is why it cannot print it. It also gives us another hint. Here, you can see that the built-in print
method tries to borrow String
. When you borrow a variable, you can access the data, but for only as long as you need it. Borrowing can be done by using the &
operator. Therefore, we can get around this issue with the following code:
fn alter_number(number: &mut String) { number.push("!".chars().next().unwrap()); } fn print_number(number: &String) { println!("{}", number); } fn main() { let mut one: String = String::from("one"); print_number(&one); alter_number(&mut one); println!("{}", one); }
In the preceding code, we borrowed the string to print it. In the second function, we did a mutable borrow, meaning that we can alter the value. We then defined a string literal, converted it into an array of chars, called the next function since it is a generator, and then unwrapped it and appended it to the string. We can see by the final print
statement that the one
variable has been changed.
If we were to try and change the value in the print_number
function, we would get an error because it's not a mutable borrow, despite one
being mutable. When it comes to immutable borrows, we can make as many as we like. For instance, if we are borrowing for a function, the function does not need to own the variable. If there is a mutable borrow, then only one mutable borrow can exist at one time, and during that lifetime, no immutable borrows can be made. This is to avoid data races.
With integers, this is easier as they implement the copy trait. This means that we don't have to borrow when passing the copy trait into a function. It's copied for us. The following code prints an integer and increases it by one:
fn alter_number(number: &mut i8) { *number += 1 } fn print_number(number: i8) { println!("{}", number); } fn main() { let mut one: i8 = 1; print_number(one); alter_number(&mut one); println!("{}", one); }
Here, we can see that the integer isn't moved into print_number
; it's copied. However, we still have to pass a mutable reference if we want to alter the number. We can also see that we've added a *
operator to the number when altering it. This is a dereference. By performing this, we have access to the integer value that we're referencing. Remember that we can directly pass the integer into the print_number
function because we know the maximum size of all i8
integers.
Running through lifetimes
Now that we have borrowing and referencing figured out, we can look into lifetimes. Remember that a borrow is not sole ownership. Because of this, there is a risk that we could reference a variable that's deleted. This can be demonstrated in the following classic demonstration of a lifetime:
fn main() { let one; { let two: i8 = 2; one = &two; } // -----------------------> two lifetime stops here println!("r: {}", one); }
This gives us the following error:
| one = &two; | ^^^^ borrowed value does not live long enough | } | - `two` dropped here while still borrowed | | println!("r: {}", one); | --- borrow later used here
Since the reference is defined in the inner scope, it's deleted at the end of the inner scope, meaning that the end of its lifetime is at the end of the inner scope. However, the lifetime of the one
variable carries on to the end of the scope of the main
function. Therefore, the lifetimes are not equal.
While it is great that this is flagged when compiling, Rust does not stop here. This concept also translates functions. Let's say that we build a function that references two integers, compares them, and returns the highest integer reference. The function is an isolated piece of code. In this function, we can denote the lifetimes of the two integers. This is done by using the '
prefix, which is a lifetime notation. The names of the notations can be anything you wish, but it's a general convention to use a
, b
, c
, and so on. Let's look at an example:
fn get_highest<'a>(first_number: &'a i8, second_number: &'a i8) -> &'a i8 { if first_number > second_number { first_number } else { second_number } } fn main() { let one: i8 = 1; { let two: i8 = 2; let outcome: &i8 = get_highest(&one, &two); println!("{}", outcome); } }
As we can see, the first and second lifetimes have the same notation of a
. They will both have to be present for the duration of the function. We also have to note that the function returns an i8
integer with the lifetime of a
. Therefore, the compiler knows that we cannot rely on the outcome outside the inner scope. However, we may want to just use the two
variable that is defined in the inner scope for reference in the function, but not for the result.
This might be a little convoluted, so to demonstrate this, let's develop a function that checks the one
variable against the two
variable. If one
is lower than two
, then we return zero; otherwise, we return the value of one
:
fn filter<'a, 'b>(first_number: &'a i8, second_number: &'b i8) -> &'a i8 { if first_number < second_number { &0 } else { first_number } } fn main() { let one: i8 = 1; let outcome: &i8; { let two: i8 = 2; outcome = filter(&one, &two); } println!("{}", outcome); }
Here, we assigned the lifetime of 'a
to first_number
, and the lifetime of 'b
to second_number
. Using 'a
and 'b
, we are telling the compiler that the lifetimes are different. We then tell the compiler in the return typing of the function that the function returns an i8
integer with the lifetime of 'a
. Therefore, we can rely on the result of the filter
function, even if the lifetime of second_number
finishes.
If we switch the second_number
lifetime type of 'a
, we get the following expected error:
| outcome = filter(&one, &two); | ^^^^ borrowed value does not live long enough | } | - `two` dropped here while still borrowed | println!("{}", outcome); | ------- borrow later used here
Even though we're still just returning first_number
that is available in the outer scope, we're telling the compiler that we're returning a variable with the 'a
lifetime, which is assigned to first_number
and second_number
. The compiler is going to side with the shortest lifetime to be safe when both lifetimes are denoted to be the same in the function.
Now that we understand the quirks behind data types, borrowing, and lifetimes, we're ready to build our own structs that have the functionality to create a hash map that accepts a range of data types.