A short intro into model building
As a result of our data analysis, we were able to identify some of the features with predictive value. We can now build a model that is using this knowledge. We start with a model that will use just two out of many features we investigated. This is called a baseline model and it is used as a starting point for the incremental refinement of the solution.
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.ensemble import RandomForestClassifier
# convert categorical data in numerical
for dataset in [train_df, test_df]:
dataset['Sex'] = dataset['Sex'].map( {'female': 1, 'male': 0} ).astype(int)
# train-validation split (20% validation)
VALID_SIZE = 0.2
train, valid = train_test_split(train_df, test_size=VALID_SIZE, random_state=42, shuffle=True)
# define predictors and target feature (labels)
predictors = ["Sex", "Pclass"]
target = 'Survived&apos...