In this chapter, we will look at predicting the winner of sports matches using a different type of classification algorithm to the ones we have seen so far: decision trees. These algorithms have a number of advantages over other algorithms. One of the main advantages is that they are readable by humans, allowing for their use in human-driven decision making. In this way, decision trees can be used to learn a procedure, which could then be given to a human to perform if needed. Another advantage is that they work with a variety of features, including categorical, which we will see in this chapter.
We will cover the following topics in this chapter:
- Using the pandas library for loading and manipulating data
- Decision trees for classification
- Random forests to improve upon decision trees
- Using real-world datasets in data mining
- Creating new features and testing them...