Exploring and understanding the dataset
Before diving into the machine learning implementation, we'll start analyzing the dataset that will be used to train our machine learning model.
For this use case, we'll use the BigQuery public dataset we've already used in Chapter 5, Predicting Boolean Values Using Binary Logistic Regression. This dataset contains information on taxi rides collected by the City of Chicago, which can be found at the following link: https://console.cloud.google.com/marketplace/details/city-of-chicago-public-data/chicago-taxi-trips.
Let's start by getting a clear understanding of the information that we have in our dataset to build our K-Means clustering model.
Understanding the data
In this section, we'll explore the structure of the data we'll use to develop our BigQuery ML model.
To start exploring the data, we need to do the following:
- Log in to GCP and access the BigQuery user interface from the navigation...