Pubs in England
The dataset contains data about 51,566 pubs in England, including the pub name, the address, the postal code, the geographical position (both by easting and northing and by latitude and longitude) and the local authority. I created a Notebook, Every Pub in England – Data Exploration to investigate this data.
Data quality check
For the data quality check, we will use info()
and describe()
to get a first glimpse. Then, we can also use our custom data quality statistics functions. We saw in the previous chapter these functions, will not repeat here. Because we will keep using them, we will group them in a utility script. I called this utility script data_quality_stats
and I defined in this module the functions missing_data
, most_frequent_values
and unique_values
. To use the functions defined in this utility script, we need to first add it to the Notebook. From File menu, we select Add utility script menu item. Then, we add the import in one of the first Notebook cells...