Unordered Associative Container
Overview
With the new C++11 standard, C++ has four unordered associative containers: std::unordered_map
, std::unordered_multimap
, std::unordered_set
and std::unordered_multiset
. They have a lot in common with their namesakes, the ordered associative containers. The difference is that the unordered ones have a richer interface and their keys are not sorted.
This shows the declaration of a std::unordered_map
.
Like std::map
, std::unordered_map
has an allocator, but std::unordered_map
needs no comparison function. Instead std::unordered_map
needs two additional functions: One, to determine the hash value of its key: std::has<key>
and second, to compare the keys for equality: std::equal_to<key>
. Because of the three default template parameters , you have only to provide the type of the key and the value of the std::unordered_map
: std::unordered_map<char,int> unordMap
.
Keys and Values
There are special rules for the key and the value of an unordered associative container.
The key has to be
- equal comparable,
- available as hash value,
- copyable or moveable.
The value has to be
- default constructible,
- copyable or moveable.
Performance
Performance - that’s the simple reason - why the unordered associative containers were so long missed in C++. In the example below, one million randomly created values are read from a 10 million big std::map
and std::unordered_map
. The impressive result is that the linear access time of an unordered associative container is 20 times faster than the access time of an ordered associative container. That is just the difference between constant and logarithmic complexity O(log n) of these operations.
The Hash Function
The reason for the constant access time of the unordered associative container is the hash function, which is schematically shown here. The hash function maps the key to its value the so-called has value. A hash function is good if it produces as few collisions as possible and equally distributes the keys onto the buckets. Because the execution of the hash function takes a constant amount of time, the access of the elements is in the base case also constant.
The hash function
- is already defined for the built-in types like boolean, natural numbers and floating point numbers,
- is available for
std::string
andstd::wstring
, - generates for a C string
const char
a hash value of the pointer address, - can be defined for user-defined data types.
For user-defined types which are used as a key for an unordered associative container, you have to keep two requirements to keep in mind. They need a hash function and have to be comparable to equal.
The Details
The unordered associative containers store their indices in buckets. In which bucket the index goes is due to the hash function, which maps the key to the index. If different keys are mapped to the same index, it’s called a collision. The hash function tries to avoid this.
Indices are typically be stored in the bucket as a linked list. Since the access to the bucket is constant, the access in the bucket is linear. The number of buckets is called capacity, the average number of elements for each bucket is called the load factor. In general, the C++ runtime generates new buckets if the load factor is greater than 1. This process is called rehashing and can also explicitly be triggered:
With the method max_load_factor
, you can read and set the load factor. So you can influence the probability of collisions and rehashing. I want to emphasise one point in the short example above. The key 500 is at first in the 5th bucket, but after rehashing is in the 500th bucket.