A binary tree is a hierarchical data structure whose behavior is similar to a tree, as it contains root and leaves (a node that has no child). The root of a binary tree is the topmost node. Each node can have at most two children, which are referred to as the left child and the right child. A node that has at least one child becomes a parent of its child. A node that has no child is a leaf. In this tutorial, you will be learning about the Binary tree data structures, its principles, and strategies in applying this data structures to various applications.
This C++ tutorial has been taken from C++ Data Structures and Algorithms. Read more here.
Take a look at the following binary tree:
From the preceding binary tree diagram, we can conclude the following:
This hierarchical data structure is usually used to store information that forms a hierarchy, such as a file system of a computer.
A binary search tree (BST) is a sorted binary tree, where we can easily search for any key using the binary search algorithm. To sort the BST, it has to have the following properties:
By having the preceding properties, we can easily search for a key value as well as find the maximum or minimum key value. Suppose we have the following BST:
As we can see in the preceding tree diagram, it has been sorted since all of the keys in the root's left subtree are smaller than the root's key, and all of the keys in the root's right subtree are greater than the root's key. The preceding BST is a balanced BST since it has a balanced left and right subtree. We also can define the preceding BST as a balanced BST since both the left and right subtrees have an equal height (we are going to discuss this further in the upcoming section).
However, since we have to put the greater new key in the right subtree and the smaller new key in the left subtree, we might find an unbalanced BST, called a skewed left or a skewed right BST. Please see the following diagram:
The preceding image is a sample of a skewed left BST, since there's no right subtree. Also, we can find a BST that has no left subtree, which is called a skewed right BST, as shown in the following diagram:
As we can see in the two skewed BST diagrams, the height of the BST becomes taller since the height equals to N - 1 (where N is the total keys in the BST), which is five. Comparing this with the balanced BST, the root's height is only three.
To create a BST in C++, we need to modify our TreeNode class in the preceding binary tree discussion, Building a binary tree ADT. We need to add the Parent properties so that we can track the parent of each node. It will make things easier for us when we traverse the tree. The class should be as follows:
class BSTNode { public: int Key; BSTNode * Left; BSTNode * Right; BSTNode * Parent; };
There are several basic operations which BST usually has, and they are as follows:
Now, let's discuss these BST operations further.
Inserting a key into the BST is actually adding a new node based on the behavior of the BST. Each time we want to insert a key, we have to compare it with the root node (if there's no root beforehand, the inserted key becomes a root) and check whether it's smaller or greater than the root's key. If the given key is greater than the currently selected node's key, then go to the right subtree. Otherwise, go to the left subtree if the given key is smaller than the currently selected node's key. Keep checking this until there's a node with no child so that we can add a new node there. The following is the implementation of the Insert() operation in C++:
BSTNode * BST::Insert(BSTNode * node, int key) { // If BST doesn't exist // create a new node as root // or it's reached when // there's no any child node // so we can insert a new node here if(node == NULL) { node = new BSTNode; node->Key = key; node->Left = NULL; node->Right = NULL; node->Parent = NULL; } // If the given key is greater than // node's key then go to right subtree else if(node->Key < key) { node->Right = Insert(node->Right, key); node->Right->Parent = node; } // If the given key is smaller than // node's key then go to left subtree else { node->Left = Insert(node->Left, key); node->Left->Parent = node; }
return node;
}
As we can see in the preceding code, we need to pass the selected node and a new key to the function. However, we will always pass the root node as the selected node when performing the Insert() operation, so we can invoke the preceding code with the following Insert() function:
void BST::Insert(int key) { // Invoking Insert() function // and passing root node and given key root = Insert(root, key); }
Based on the implementation of the Insert() operation, we can see that the time complexity to insert a new key into the BST is O(h), where h is the height of the BST. However, if we insert a new key into a non-existing BST, the time complexity will be O(1), which is the best case scenario. And, if we insert a new key into a skewed tree, the time complexity will be O(N), where N is the total number of keys in the BST, which is the worst case scenario.
We have successfully created a new BST and can insert a new key into it. Now, we need to implement the PrintTreeInOrder() operation, which will traverse the BST in order from the smallest key to the greatest key. To achieve this, we will go to the leftmost node and then to the rightmost node. The code should be as follows:
void BST::PrintTreeInOrder(BSTNode * node)
{
// Stop printing if no node found
if(node == NULL)
return;
// Get the smallest key first
// which is in the left subtree
PrintTreeInOrder(node->Left);
// Print the key
std::cout << node->Key << " ";
// Continue to the greatest key
// which is in the right subtree
PrintTreeInOrder(node->Right);
}
Since we will always traverse from the root node, we can invoke the preceding code as follows:
void BST::PrintTreeInOrder()
{
// Traverse the BST
// from root node
// then print all keys
PrintTreeInOrder(root);
std::cout << std::endl;
}
The time complexity of the PrintTreeInOrder() function will be O(N), where N is the total number of keys for both the best and the worst cases since it will always traverse to all keys.
Suppose we have a BST and need to find out if a key exists in the BST. It's quite easy to check whether a given key exists in a BST, since we just need to compare the given key with the current node. If the key is smaller than the current node's key, we go to the left subtree, otherwise we go to the right subtree. We will do this until we find the key or when there are no more nodes to find. The implementation of the Search() operation should be as follows:
BSTNode * BST::Search(BSTNode * node, int key) { // The given key is // not found in BST if (node == NULL) return NULL; // The given key is found else if(node->Key == key) return node; // The given is greater than // current node's key else if(node->Key < key) return Search(node->Right, key); // The given is smaller than // current node's key else return Search(node->Left, key); }
Since we will always search for a key from the root node, we can create another Search() function as follows:
bool BST::Search(int key)
{
// Invoking Search() operation
// and passing root node
BSTNode * result = Search(root, key);
// If key is found, returns TRUE
// otherwise returns FALSE
return result == NULL ?
false :
true;
}
The time complexity to find out a key in the BST is O(h), where h is the height of the BST. If we find a key which lies in the root node, the time complexity will be O(1), which is the best case. If we search for a key in a skewed tree, the time complexity will be O(N), where N is the total number of keys in the BST, which is the worst case.
Finding out the minimum and maximum key values in a BST is also quite simple. To get a minimum key value, we just need to go to the leftmost node and get the key value. On the contrary, we just need to go to the rightmost node and we will find the maximum key value. The following is the implementation of the FindMin() operation to retrieve the minimum key value, and the FindMax() operation to retrieve the maximum key value:
int BST::FindMin(BSTNode * node)
{
if(node == NULL)
return -1;
else if(node->Left == NULL)
return node->Key;
else
return FindMin(node->Left);
}
int BST::FindMax(BSTNode * node)
{
if(node == NULL)
return -1;
else if(node->Right == NULL)
return node->Key;
else
return FindMax(node->Right);
}
As usual, we will always find the minimum and maximum key values from the root node, so we can invoke the preceding operations as follows:
int BST::FindMin()
{
return FindMin(root);
}
int BST::FindMax()
{
return FindMax(root);
}
Similar to the Search() operation, the time complexity of the FindMin() and FindMax() operations is O(h), where h is the height of the BST. However, if we find the maximum key value in a skewed left BST, the time complexity will be O(1), which is the best case, since it doesn't have any right subtree. This also happens if we find the minimum key value in a skewed right BST. The worst case will appear if we try to find the minimum key value in a skewed left BST or try to find the maximum key value in a skewed right BST, since the time complexity will be O(N).
Other properties that we can find from a BST are the successor and the predecessor. We are going to create two functions named Successor() and Predecessor() in C++. But before we create the code, let's discuss how to find out the successor and the predecessor of a key of a BST. In this section, we are going to learn about the successor first, and then we will discuss the predecessor in the upcoming section.
There are three rules to find out the successor of a key of a BST. Suppose we have a key, k, that we have searched for using the previous Search() function. We will also use our preceding BST to find out the successor of a specific key. The successor of k can be found as follows:
Based on our preceding discussion about how to find out the successor of a given key in a BST, we can create a Successor() function in C++ with the following implementation:
int BST::Successor(BSTNode * node) { // The successor is the minimum key value // of right subtree if (node->Right != NULL) { return FindMin(node->Right); } // If no any right subtree else { BSTNode * parentNode = node->Parent; BSTNode * currentNode = node;
// If currentNode is not root and
// currentNode is its right children
// continue moving up
while ((parentNode != NULL) &&
(currentNode == parentNode->Right))
{
currentNode = parentNode;
parentNode = currentNode->Parent;
}
// If parentNode is not NULL
// then the key of parentNode is
// the successor of node
return parentNode == NULL ?
-1 :
parentNode->Key;
}
}
However, since we have to find a given key's node first, we have to run Search() prior to invoking the preceding Successor() function. The complete code for searching for the successor of a given key in a BST is as follows:
int BST::Successor(int key)
{
// Search the key's node first
BSTNode * keyNode = Search(root, key);
// Return the key.
// If the key is not found or
// successor is not found,
// return -1
return keyNode == NULL ?
-1 :
Successor(keyNode);
}
From our preceding Successor() operation, we can say that the average time complexity of running the operation is O(h), where h is the height of the BST. However, if we try to find out the successor of a maximum key in a skewed right BST, the time complexity of the operation is O(N), which is the worst case scenario.
Now, we can implement the Predecessor() operation in C++ as follows:
int BST::Predecessor(BSTNode * node) { // The predecessor is the maximum key value // of left subtree if (node->Left != NULL) { return FindMax(node->Left); } // If no any left subtree else { BSTNode * parentNode = node->Parent; BSTNode * currentNode = node;
// If currentNode is not root and
// currentNode is its left children
// continue moving up
while ((parentNode != NULL) &&
(currentNode == parentNode->Left))
{
currentNode = parentNode;
parentNode = currentNode->Parent;
}
// If parentNode is not NULL
// then the key of parentNode is
// the predecessor of node
return parentNode == NULL ?
-1 :
parentNode->Key;
}
}
And, similar to the Successor() operation, we have to search for the node of a given key prior to invoking the preceding Predecessor() function. The complete code for searching for the predecessor of a given key in a BST is as follows:
int BST::Predecessor(int key)
{
// Search the key's node first
BSTNode * keyNode = Search(root, key);
// Return the key.
// If the key is not found or
// predecessor is not found,
// return -1
return keyNode == NULL ?
-1 :
Predecessor(keyNode);
}
Similar to our preceding Successor() operation, the time complexity of running the Predecessor() operation is O(h), where h is the height of the BST. However, if we try to find out the predecessor of a minimum key in a skewed left BST, the time complexity of the operation is O(N), which is the worst case scenario.
The last operation in the BST that we are going to discuss is removing a node based on a given key. We will create a Remove() operation in C++. There are three possible cases for removing a node from a BST, and they are as follows:
Also, similar to the Search() operation, if the target node doesn't exist, we just need to return NULL. The implementation of the Remove() operation in C++ is as follows:
BSTNode * BST::Remove( BSTNode * node, int key) { // The given node is // not found in BST if (node == NULL) return NULL;
// Target node is found
if (node->Key == key)
{
// If the node is a leaf node
// The node can be safely removed
if (node->Left == NULL && node->Right == NULL)
node = NULL;
// The node have only one child at right
else if (node->Left == NULL && node->Right != NULL)
{
// The only child will be connected to
// the parent's of node directly
node->Right->Parent = node->Parent;
// Bypass node
node = node->Right;
}
// The node have only one child at left
else if (node->Left != NULL && node->Right == NULL)
{
// The only child will be connected to
// the parent's of node directly
node->Left->Parent = node->Parent;
// Bypass node
node = node->Left;
}
// The node have two children (left and right)
else
{
// Find successor or predecessor to avoid quarrel
int successorKey = Successor(key);
// Replace node's key with successor's key
node->Key = successorKey;
// Delete the old successor's key
node->Right = Remove(node->Right, successorKey);
}
}
// Target node's key is smaller than
// the given key then search to right
else if (node->Key < key)
node->Right = Remove(node->Right, key);
// Target node's key is greater than
// the given key then search to left
else
node->Left = Remove(node->Left, key);
// Return the updated BST
return node;
}
Since we will always remove a node starting from the root node, we can simplify the preceding Remove() operation by creating the following one:
void BST::Remove(int key) { root = Remove(root, key); }
As shown in the preceding Remove() code, the time complexity of the operation is O(1) for both case 1 (the node that has no child) and case 2 (the node that has only one child). For case 3 (the node that has two children), the time complexity will be O(h), where h is the height of the BST, since we have to find the successor or predecessor of the node's key.
If you found this tutorial useful, do check out the book C++ Data Structures and Algorithms for more useful material on data structure and algorithms with real-world implementation in C++.