Conducting EDA
Let’s now explore the data a little bit further in order to better understand basic information such as record counts, data types, and missingness. You can run the .info()
method on the pandas DataFrame to see how many non-null values there are in each column. Remember that the .info()
method is one of many pandas methods to explore your data, which were discussed in Chapter 4, Exploring Geospatial Data Science Packages. Records with null values indicate an area where the data may need to be cleaned. Output from the .info()
method is included in Figure 5.3:
Figure 5.3 – Subset New York City Airbnb data info
Running this method reveals that there are 37,410 records in the dataset. It also reveals 910 missing records for bedrooms and 37,410 missing records for the bathrooms
variable. Given that there is no obvious way to impute the missing values here, we’ll go ahead and drop them by using the .drop()
method. By specifying...