What is in a name?
We follow now with the analysis including the Name into the data we are processing to extract meaningful information. From our initial visual inspection, we understood that all names have a similar structure. It starts with a Family Name, followed by comma, then it is a Title (short version, followed by a point), a Given Name and, for those that by marriage acquired a new name, the old or maiden name. Let’s process the data to extract these information. The code is given in the lines below.
def parse_names(row):
try:
text = row["Name"]
split_text = text.split(",")
family_name = split_text[0]
next_text = split_text[1]
split_text = next_text.split(".")
title = split_text[0] + "."
next_text = split_text[1]
if "(" in next_text:
split_text = next_text.split("(")
given_name = split_text[0]
maiden_name ...