Chapter 7. Machine Learning with Spark and Hadoop
We have discussed a typical life cycle of a data science project in Chapter 1, Big Data Analytics at a 10,000-Foot View. This chapter, however, is aimed at learning more about machine learning techniques used in data science with Spark and Hadoop.
Data science is all about extracting deep meaning from data and creating data products. This requires both tools and methods such as statistics, machine learning algorithms, and tools for data collection and data cleansing. Once the data is collected and cleansed, it is analyzed using exploratory analytics to find patterns and build models with the aim of extracting deep meaning or creating a data product.
So, let's understand how these patterns and models are created. This chapter is divided into the following subtopics:
- Introducing machine learning
- Machine learning on Spark and Hadoop
- Machine learning algorithms
- Examples of machine learning algorithms
- Building machine learning pipelines...