Code to transform data
In this chapter, we will look at some code that analyzes survey data that Kaggle did in 2018. The survey queried Kaggle users about socio-economic information.
This section will present the survey data along with some code to analyze it. The subtitle for this data is "the most comprehensive dataset available on the state of machine learning and data science". Let's dig into this data and see what it has. The data was originally available at https://www.kaggle.com/kaggle/kaggle-survey-2018.
How to do it…
- Load the data into a DataFrame:
>>> import pandas as pd >>> import numpy as np >>> import zipfile >>> url = 'data/kaggle-survey-2018.zip' >>> with zipfile.ZipFile(url) as z: ... print(z.namelist()) ... kag = pd.read_csv(z.open('multipleChoiceResponses.csv')) ... df = kag.iloc[1:] ['multipleChoiceResponses.csv', 'freeFormResponses...