Predicting who will survive on the Titanic with logistic regression
In this recipe, we will introduce logistic regression, a basic classifier. We will also show how to perform a grid search with cross-validation.
We will apply these techniques on a Kaggle dataset where the goal is to predict survival on the Titanic based on real data.
Note
Kaggle (www.kaggle.com/competitions) hosts machine learning competitions where anyone can download a dataset, train a model, and test the predictions on the website. The author of the best model might even win a prize! It is a fun way to get started with machine learning.
Getting ready
Download the Titanic dataset from the book's GitHub repository at https://github.com/ipython-books/cookbook-data.
The dataset has been obtained from www.kaggle.com/c/titanic-gettingStarted.
How to do it...
We import the standard packages:
In [1]: import numpy as np import pandas as pd import sklearn import sklearn.linear_model as lm import sklearn...