Exploring and understanding the dataset
In this section, we'll analyze and prepare the dataset for our use case. We'll start with some data quality checks, and then we'll segment the data into training, evaluation, and test tables.
Since the dataset has already been used in Chapter 6, Classifying Trees with Multiclass Logistic Regression, we will not start the analysis from the beginning. Instead, we'll focus on the most relevant queries for our business scenario.
Checking the data quality
To start our exploration of the data and to carry out data quality checks, we need to do the following:
- Log in to our Google Cloud Console and access the BigQuery User Interface (UI) from the navigation menu.
- Create a new dataset under the project that we created in Chapter 2, Setting Up Your GCP and BigQuery Environment. For this use case, we'll create aÂ
10_nyc_trees_xgboost
dataset with the default options. - First of all, let's check if...