Using EDA Python packages
Sometimes it's helpful to create some specific EDA plots and statistics to investigate features of interest, but often, it's helpful to run an auto-EDA package on our data as one of our first steps. There are a host of different EDA packages in Python (and R), but we'll stick to just covering pandas-profiling
. This is a convenient package that creates an EDA summary with only a few lines of code from a pandas
DataFrame. Once we have our data loaded, we load the ProfileReport
function from pandas-profiling
:
from pandas_profiling import ProfileReport
Since dashes are not allowed in module names, we need to use an underscore for the library name, pandas_profiling
. Once we have this loaded, we can create our report and display it:
report = ProfileReport(df)
Within Jupyter Notebook, we have a few options for display. We can simply print out the variable in a Jupyter Notebook cell like so:
report
Or, we can use report.to_widgets...