This chapter will introduce decision trees, random forests, and gradient-boosted trees. The use of decision trees is popular in data science because they provide a visual representation of how the information in the training set can be represented as a hierarchy. Traversing the hierarchy based on an observation helps you to predict the probability of that event. We will explore how these algorithms can be used to predict when a user may click on online advertisement based on existing advertising click records. Additionally, we will show how to use AWS Elastic MapReduce (EMR) with Apache Spark and the SageMaker XGBoost service to engineer models in the context of big data.
In this chapter, we will cover the following topics:
- Understanding decision trees
- Understanding random forests algorithms
- Understanding gradient boosting algorithms...