Extracting meaningful information from passenger names
We continue now with our analysis, including analyzing the passengers’ names to extract meaningful information. As you will remember from the beginning of this chapter, the Name
column also contains some additional information. After our preliminary visual analysis, it became apparent that all names follow a similar structure. They begin with a Family Name
, followed by a comma, then a Title
(short version, followed by a period), then a Given Name
, and, in cases where a new name was acquired through marriage, the previous or Maiden Name
. Let’s process the data to extract this information. The code to extract this information will be:
def parse_names(row):
try:
text = row["Name"]
split_text = text.split(",")
family_name = split_text[0]
next_text = split_text[1]
split_text = next_text.split(".")
title = (split_text[0] + "."...