3. Python's Statistical Toolbox
Activity 3.01: Revisiting the Communities and Crimes Dataset
Solution
- The libraries can be imported, and pandas can be used to read in the dataset as follows:
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns df = pd.read_csv('CommViolPredUnnormalizedData.txt') df.head()
Your output should be the following:
- To replace the special character with the
np.nan
object, we can use the following code:df = df.replace('?', np.nan)
- To compute the actual count for the different age groups, we can simply use the expression
df['population'] * df['agePct...']
, which computes the count in a vectorized way:age_groups = ['12t21', '12t29', '16t24', '65up'] for group in age_groups: Â Â Â Â df['ageCnt' + group] = (df['population'] * \ Â Â ...