You're reading from Machine Learning with BigQuery ML Create, execute, and improve machine learning models in BigQuery using standard SQL queries

Product type Paperback

Published in Jun 2021

Publisher Packt

ISBN-13 9781800560307

Length 344 pages

Edition 1st Edition

Languages

Python

Tools

BigQuery

Concepts

Machine Learning

Author (1):

Alessandro Marrandino

View More author details

Table of Contents (20) Chapters

Preface

1. Section 1: Introduction and Environment Setup

2. Chapter 1: Introduction to Google Cloud and BigQuery FREE CHAPTER

3. Chapter 2: Setting Up Your GCP and BigQuery Environment

4. Chapter 3: Introducing BigQuery Syntax

5. Section 2: Deep Learning Networks

6. Chapter 4: Predicting Numerical Values with Linear Regression

7. Chapter 5: Predicting Boolean Values Using Binary Logistic Regression

8. Chapter 6: Classifying Trees with Multiclass Logistic Regression

9. Section 3: Advanced Models with BigQuery ML

10. Chapter 7: Clustering Using the K-Means Algorithm

11. Chapter 8: Forecasting Using Time Series

12. Chapter 9: Suggesting the Right Product by Using Matrix Factorization

13. Chapter 10: Predicting Boolean Values Using XGBoost

14. Chapter 11: Implementing Deep Neural Networks

15. Section 4: Further Extending Your ML Capabilities with GCP

16. Chapter 12: Using BigQuery ML with AI Notebooks

17. Chapter 13: Running TensorFlow Models with BigQuery ML

18. Chapter 14: BigQuery ML Tips and Best Practices

19. Other Books You May Enjoy

Training the K-Means clustering model

In this section, we'll create two different K-Means machine learning models. The first model will be created using taxi_miles_per_minute as a training dataset, while the second will include also tot_income as a feature and will leverage taxi_speed_and_income. Let's proceed as follows:

As a first step, let's start training a machine learning model named clustering_by_speed by running the following code:
```
CREATE OR REPLACE MODEL `07_chicago_taxi_drivers.clustering_by_speed`
OPTIONS(model_type='kmeans', num_clusters=3, kmeans_init_method = 'KMEANS++') AS
  SELECT * EXCEPT (taxi_id)
  FROM `07_chicago_taxi_drivers.taxi_miles_per_minute`;
```
The first lines of the SQL statement are composed of the CREATE OR REPLACE MODEL keywords, followed by the identifier of the `07_chicago_taxi_drivers.clustering_by_speed` machine learning model and the OPTIONS clause.
Now, let's take a look at the options...