2. Python's Main Tools for Statistics
Activity 2.01: Analyzing the Communities and Crime Dataset
Solution:
- Once the dataset has been downloaded, the libraries can be imported, and pandas can be used to read in the dataset in a new Jupyter notebook, as follows:
import pandas as pd import numpy as np import matplotlib.pyplot as plt df = pd.read_csv('CommViolPredUnnormalizedData.txt') df.head()
We are also printing out the first five rows of the dataset, which should be as follows:
- To print out the column names, we can simply iterate through
df.columns
in afor
loop, like so:for column in df.columns: Â Â Â Â print(column)
- The total number of columns in the dataset can be computed using the
len()
function in Python:print(len(df.columns))
- To replace the special character
'?'
withnp.nan
objects, we can use thereplace()
method:df = df.replace('?', np.nan)
- To print out...