To illustrate how we can use pandas to assist us at the start of our machine learning journey, we will apply it to a classic problem, which is hosted on the Kaggle website (http://www.kaggle.com). Kaggle is a competition platform for machine learning problems. The idea behind Kaggle is to enable companies that are interested in solving predictive analytics problems with their data to post their data on Kaggle and invite data scientists to come up with proposed solutions to their problems. A competition can be ongoing over a period of time, and the rankings of the competitors are posted on a leaderboard. At the close of the competition, the top-ranked competitors receive cash prizes.
The classic problem that we will study to illustrate the use of pandas for machine learning with scikit-learn is the Titanic: Machine...