Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Machine Learning with BigQuery ML

You're reading from   Machine Learning with BigQuery ML Create, execute, and improve machine learning models in BigQuery using standard SQL queries

Arrow left icon
Product type Paperback
Published in Jun 2021
Publisher Packt
ISBN-13 9781800560307
Length 344 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Alessandro Marrandino Alessandro Marrandino
Author Profile Icon Alessandro Marrandino
Alessandro Marrandino
Arrow right icon
View More author details
Toc

Table of Contents (20) Chapters Close

Preface 1. Section 1: Introduction and Environment Setup
2. Chapter 1: Introduction to Google Cloud and BigQuery FREE CHAPTER 3. Chapter 2: Setting Up Your GCP and BigQuery Environment 4. Chapter 3: Introducing BigQuery Syntax 5. Section 2: Deep Learning Networks
6. Chapter 4: Predicting Numerical Values with Linear Regression 7. Chapter 5: Predicting Boolean Values Using Binary Logistic Regression 8. Chapter 6: Classifying Trees with Multiclass Logistic Regression 9. Section 3: Advanced Models with BigQuery ML
10. Chapter 7: Clustering Using the K-Means Algorithm 11. Chapter 8: Forecasting Using Time Series 12. Chapter 9: Suggesting the Right Product by Using Matrix Factorization 13. Chapter 10: Predicting Boolean Values Using XGBoost 14. Chapter 11: Implementing Deep Neural Networks 15. Section 4: Further Extending Your ML Capabilities with GCP
16. Chapter 12: Using BigQuery ML with AI Notebooks 17. Chapter 13: Running TensorFlow Models with BigQuery ML 18. Chapter 14: BigQuery ML Tips and Best Practices 19. Other Books You May Enjoy

Discovering BigQuery ML

Developing a new ML model can require a lot of effort and can be a time-consuming activity. It usually requires different skills and is a complex activity, especially in large enterprises. The typical journey of an ML model can be summarized with the following flow:

Figure 1.13 – An ML model's typical development life cycle

Figure 1.13 – An ML model's typical development life cycle

The first two steps involve preliminary raw data analyses and operations:

  1. In the Data Exploration and Understanding phase, the data engineer or data scientist takes a first look at the data, tries to understand the meaning of all the columns in the dataset, and then selects the fields to take into consideration for the new use case.
  2. During Data Preparation, the data engineer filters, aggregates, and cleans up the datasets, making them available and ready to use for the subsequent training phase.

After these two first stages, the actual ML developing process starts:

  1. Leveraging ML frameworks such as TensorFlow and programming languages such as Python, the data scientist will engage in the Design the ML model step, experimenting with different algorithms on the training dataset.
  2. When the right ML algorithm is selected, the data scientist performs the Tuning of the ML model step, applying feature engineering techniques and hyperparameter tuning to get better performance out of the ML model.
  3. When the model is ready, a final Evaluation step is executed on the evaluation dataset. This phase proves the effectiveness of the ML model on a new dataset that's different from the training one and eventually leads to further refinements of the asset.
  4. After the development process, the ML model is generally deployed and used in a production environment with scalability and robustness requirements.
  5. The ML model is also eventually updated in a subsequent stage due to different incoming data or to apply further improvements.

All of these steps require different skills and are based on the collaboration of different stakeholders, such as business analysts for data exploration and understanding, data engineers for data preparation, data scientists for the development of the ML model, and finally the IT department to make the model usable in a safe, robust, and scalable production environment.

BigQuery ML simplifies and accelerates the entire development process of a new ML model, allowing you to do the following:

  • Design, train, evaluate, and serve the ML model, leveraging SQL and the existing skills in your company.
  • Automate most of the tuning activities that are usually highly time-consuming to get an effective model.
  • Ensure that you have a robust, scalable, and easy-to-use ML model, leveraging all the native features of BigQuery that we've already discussed in the BigQuery's advantages over traditional data warehouses section of this chapter.

In the following diagram, you can see the life cycle of an ML model that uses BigQuery ML:

Figure 1.14 – An ML model's development life cycle with BigQuery ML

Figure 1.14 – An ML model's development life cycle with BigQuery ML

Now that we've learned the basics of BigQuery ML, let's take a look at the main benefits that it can bring.

BigQuery ML benefits

BigQuery ML can bring both business and technical benefits during the life cycle of an ML model:

  • Business users and data analysts can evolve from a traditional descriptive and reporting approach to a new predictive approach to take better decisions using their existing SQL skills. 
  • Technical users can benefit from the automation of BigQuery ML during the tuning phase of the model, using a unique, centralized tool that can accelerate the entire development process of an ML model.
  • The development process is further sped up because the datasets required to build the ML model are already available to the right users and don't need to be moved from one data repository to another, which carries compliance and data duplication risks.
  • The IT department does not need to manage the infrastructure to serve and use the ML model in a production environment because the BigQuery serverless architecture natively supports the model in a scalable, safe, and robust manner.

After our analysis of the benefits that BigQuery ML can bring, let's now see what the supported ML algorithms are.

BigQuery ML algorithms

The list of ML algorithms supported by BigQuery ML is growing quickly. Currently, the following supervised ML techniques are currently supported:

  • Linear regression: To forecast numerical values with a linear model
  • Binary logistic regression: For classification use cases when the choice is between only two different options (Yes or No, 1 or 0, True or False)
  • Multiclass logistic regression: For classification scenarios when the choice is between multiple options
  • Matrix factorization: For developing recommendation engines based on past information
  • Time series: To forecast business KPIs leveraging timeseries data from the past
  • Boosted tree: For classification and regression use cases with XGBoost
  • AutoML table: To leverage AutoML capabilities from the BigQuery SQL interface
  • Deep Neural Network (DNN): For developing TensorFlow models for classification or regression scenarios, avoiding any lines of code

When the training dataset doesn't contain labeled data, the learning is defined as unsupervised. BigQuery ML currently supports the following:

  • K-means clustering: For data segmentation of similar objects (people, facts, events)

In addition to what is listed, BigQuery ML allows you to import and use pre-trained TensorFlow models using SQL statements.

You have been reading a chapter from
Machine Learning with BigQuery ML
Published in: Jun 2021
Publisher: Packt
ISBN-13: 9781800560307
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime