Training the K-Means clustering model
In this section, we'll create two different K-Means machine learning models. The first model will be created using taxi_miles_per_minute
as a training dataset, while the second will include also tot_income
as a feature and will leverage taxi_speed_and_income
. Let's proceed as follows:
- As a first step, let's start training a machine learning model named
clustering_by_speed
by running the following code:CREATE OR REPLACE MODEL `07_chicago_taxi_drivers.clustering_by_speed` OPTIONS(model_type='kmeans', num_clusters=3, kmeans_init_method = 'KMEANS++') AS Â Â SELECT * EXCEPT (taxi_id) Â Â FROM `07_chicago_taxi_drivers.taxi_miles_per_minute`;
The first lines of the SQL statement are composed of the
CREATE OR REPLACE MODEL
keywords, followed by the identifier of the`07_chicago_taxi_drivers.clustering_by_speed`
machine learning model and theOPTIONS
clause.Now, let's take a look at the options...