Side effects and immutability
Side effects and immutability are two important concepts in programming that have a significant impact on the quality and maintainability of code.
Side effects refer to changes that occur in the state of the program as a result of executing a particular function or piece of code. Side effects can be explicit, such as writing data to a file or updating a variable, or implicit, such as modifying the global state or causing unexpected behavior in other parts of the code.
Immutability, on the other hand, refers to the property of a variable or data structure that cannot be modified after it has been created. In functional programming, immutability is achieved by making data structures and variables constant and avoiding side effects.
The importance of avoiding side effects and using immutable variables lies in the fact that they make code easier to understand, debug, and maintain. When code has few side effects, it is easier to reason about what it does and what it does not do. This makes finding and fixing bugs and making changes to the code easier without affecting other parts of the system.
In contrast, code with many side effects is harder to understand, as the state of the program can change in unexpected ways. This makes it more difficult to debug and maintain and can lead to bugs and unexpected behavior.
Functional programming languages have long emphasized the use of immutability and the avoidance of side effects, but it is now possible to write code with these properties using C++. The easiest way to achieve it is to follow the C++ Core Guidelines for Constants and Immutability.
Con.1 – by default, make objects immutable
You can declare a built-in data type or an instance of a user-defined data type as constant, resulting in the same effect. Attempting to modify it will result in a compiler error:
struct Data { int val{42}; }; int main() { const Data data; data.val = 43; // assignment of member 'Data::val' in // read-only object const int val{42}; val = 43; // assignment of read-only variable 'val' }
The same applies to loops:
for (const int i : array) { std::cout << i << std::endl; // just reading: const } for (int i : array) { std::cout << i << std::endl; // just reading: non-const }
This approach allows the prevention of hard-to-notice changes of value.
Probably, the only exception is function parameters passed by value:
void foo(const int value);
Such parameters are rarely passed as const
and rarely mutated. In order to avoid confusion, it is recommended not to enforce this rule in such cases.
Con.2 – by default, make member functions const
A member function (method) shall be marked as const
unless it changes the observable state of an object. The reason behind this is to give a more precise statement of design intent, better readability, maintainability, more errors caught by the compiler, and theoretically more optimization opportunities:
class Book { public: std::string name() { return name_; } private: std::string name_; }; void print(const Book& book) { cout << book.name() << endl; // ERROR: 'this' argument to member // function // 'name' has type 'const Book', but // function is not marked // const clang(member_function_call_bad_cvr) }
There are two types of constness: physical and logical:
Physical constness: An object is declared const
and cannot be changed.
Logical constness: An object is declared const
but can be changed.
Logical constness can be achieved with the mutable
keyword. In general, it is a rare use case. The only good example I can think of is storing in an internal cache or using a mutex:
class DataReader { public: Data read() const { auto lock = std::lock_guard<std::mutex>(mutex); // read data return Data{}; } private: mutable std::mutex mutex; };
In this example, we need to change the mutex
variable to lock it, but this does not affect the logical constness of the object.
Please be aware that there exist legacy codes/libraries that provide functions that declare T*
, despite not making any changes to the T
. This presents an issue for individuals who are trying to mark all logically constant methods as const
. In order to enforce constness, you can do the following:
- Update the library/code to be const-correct, which is the preferred solution.
- Provide a wrapper function casting away the constness.
Example
void read_data(int* data); // Legacy code: read_data does // not modify `*data` void read_data(const int* data) { read_data(const_cast<int*>(data)); }
Note that this solution is a patch that can be used only when the declaration of read_data
cannot be modified.
Con.3 – by default, pass pointers and references to const
This one is easy; it is far easier to reason about programs when called functions do not modify state.
Let us look at the two following functions:
void foo(char* p); void bar(const char* p);
Does the foo
function modify the data the p
pointer points to? We cannot answer by looking at the declaration, so we assume it does by default. However, the bar
function states explicitly that the content of p
will not be changed.
Con.4 – use const to define objects with values that do not change after construction
This rule is very similar to the first one, enforcing the constness of objects that are not expected to be changed in the future. It is often helpful with classes such as Config
that are created at the beginning of the application and not changed during its lifetime:
class Config { public: std::string hostname() const; uint16_t port() const; }; int main(int argc, char* argv[]) { const Config config = parse_args(argc, argv); run(config); }
Con.5 – use constexpr for values that can be computed at compile time
Declaring variables as constexpr
is preferred over const
if the value is computed at compile time. It provides such benefits as better performance, better compile-time checking, guaranteed compile-time evaluation, and no possibility of race conditions.
Constness and data races
Data races occur when multiple threads access a shared variable simultaneously, and at least one tries to modify it. There are synchronization primitives such as mutexes, critical sections, spinlocks, and semaphores, allowing the prevention of data races. The problem with these primitives is that they either do expensive system calls or overuse the CPU, which makes the code less efficient. However, if none of the threads modifies the variable, there is no place for data races. We learned that constexpr
is thread-safe (does not need synchronization) because it is defined at compile time. What about const
? It can be thread-safe under the below conditions.
The variable has been const
since its creation. If a thread has direct or indirect (via a pointer or reference) non-const access to the variable, all the readers need to use mutexes. The following code snippet illustrates constant and non-constant access from multiple threads:
void a() { auto value = int{42}; auto t = std::thread([&]() { std::cout << value; }); t.join(); } void b() { auto value = int{42}; auto t = std::thread([&value = std::as_const(value)]() { std::cout << value; }); t.join(); } void c() { const auto value = int{42}; auto t = std::thread([&]() { auto v = const_cast<int&>(value); std::cout << v; }); t.join(); } void d() { const auto value = int{42}; auto t = std::thread([&]() { std::cout << value; }); t.join(); }
In the a
function, the value
variable is owned as non-constant by both the main thread and t
, which makes the code potentially not thread-safe (if a developer decides to change the value
later in the main thread). In the b
, the main thread has “write” access to value
while t
receives it via a const
reference, but still, it is not thread-safe. The c
function is an example of very bad code: the value
is created as a constant in the main thread and passed as a const
reference but then the constness is cast away, which makes this function not thread-safe. Only the d
function is thread-safe because neither the main thread nor t
can modify the variable.
The data type and all sub-types of the variable are either physically constant or their logical constness implementation is thread-safe. For example, in the following example, the Point
struct is physically constant because its x
and y
field members are primitive integers, and both threads have only const
access to it:
struct Point { int x; int y; }; void foo() { const auto point = Point{.x = 10, .y = 10}; auto t = std::thread([&]() { std::cout << point.x; }); t.join(); }
The DataReader
class that we saw earlier is logically constant because it has a mutable variable, mutex
, but this implementation is also thread-safe (due to the lock):
class DataReader { public: Data read() const { auto lock = std::lock_guard<std::mutex>(mutex); // read data return Data{}; } private: mutable std::mutex mutex; };
However, let us look into the following case. The RequestProcessor
class processes some heavy requests and caches the results in an internal variable:
class RequestProcessor { public: Result process(uint64_t request_id, Request request) const { if (auto it = cache_.find(request_id); it != cache_.cend()) { return it->second; } // process request // create result auto result = Result{}; cache_[request_id] = result; return result; } private: mutable std::unordered_map<uint64_t, Result> cache_; }; void process_request() { auto requests = std::vector<std::tuple<uint64_t, Request>>{}; const auto processor = RequestProcessor{}; for (const auto& request : requests) { auto t = std::thread([&]() { processor.process(std::get<0>(request), std::get<1>(request)); }); t.detach(); } }
This class is logically safe, but the cache_
variable is changed in a non-thread-safe way, which makes the class non-thread-safe even when declared as const
.
Note that when working with STL containers, it is essential to remember that, despite current implementations tending to be thread-safe (physically and logically), the standard provides very specific thread-safety guarantees.
All functions in a container can be called simultaneously by various threads on different containers. Broadly, functions from the C++ standard library don’t read objects accessible to other threads unless they are reachable through the function arguments, which includes the this
pointer.
All const
member functions are thread-safe, meaning they can be invoked simultaneously by various threads on the same container. Furthermore, the begin()
, end()
, rbegin()
, rend()
, front()
, back()
, data()
, find()
, lower_bound()
, upper_bound()
, equal_range()
, at()
, and operator[]
(except in associative containers) member functions also behave as const
with regard to thread safety. In other words, they can also be invoked by various threads on the same container. Broadly, C++ standard library functions won’t modify objects unless those objects are reachable, directly or indirectly, via the function’s non-const arguments, which includes the this
pointer.
Different elements in the same container can be altered simultaneously by different threads, with the exception of std::vector<bool>
elements. For example, a std::vector
of std::future
objects can receive values from multiple threads at once.
Operations on iterators, such as incrementing an iterator, read the underlying container but don’t modify it. These operations can be performed concurrently with operations on other iterators of the same container, with the const
member functions, or with reads from the elements. However, operations that invalidate any iterators modify the container and must not be performed concurrently with any operations on existing iterators, even those that are not invalidated.
Elements of the same container can be altered concurrently with those member functions that don’t access these elements. Broadly, C++ standard library functions won’t read objects indirectly accessible through their arguments (including other elements of a container) except when required by its specification.
Lastly, operations on containers (as well as algorithms or other C++ standard library functions) can be internally parallelized as long as the user-visible results remain unaffected. For example, std::transform
can be parallelized, but std::for_each
cannot, as it is specified to visit each element of a sequence in order.
The idea of having a single mutable reference to an object became one of the pillars of the Rust programming language. This rule is in place to prevent data races, which occur when multiple threads access the same mutable data concurrently, resulting in unpredictable behavior and potential crashes. By allowing only one mutable reference to an object at a time, Rust ensures that concurrent access to the same data is properly synchronized and avoids data races.
In addition, this rule helps prevent mutable aliasing, which occurs when multiple mutable references to the same data exist simultaneously. Mutable aliasing can lead to subtle bugs and make code difficult to reason about, especially in large and complex code bases. By allowing only one mutable reference to an object, Rust avoids mutable aliasing and helps ensure that code is correct and easy to understand.
However, it’s worth noting that Rust also allows multiple immutable references to an object, which can be useful in scenarios where concurrent access is necessary but mutations are not. By allowing multiple immutable references, Rust can provide better performance and concurrency while still maintaining safety and correctness.