This chapter will take you on a whirlwind tour of machine learning, focusing on using the pandas library as a tool to preprocess the data used by machine learning programs. It will also introduce you to the scikit-learn library, which is the most popular machine learning toolkit in Python.
In this chapter, we will illustrate machine learning techniques by applying them to a well-known problem about classifying which passengers survived the Titanic disaster at the turn of the last century. The various topics addressed in this chapter include the following:
- The role of pandas in machine learning
- Installing scikit-learn
- Introduction to machine learning concepts
- Applying machine learning—Kaggle Titanic competition
- Data analysis and preprocessing using pandas
- A naïve approach to the Titanic problem
- The scikit-learn ML classifier interface...