You're reading from Mastering Azure Machine Learning Perform large-scale end-to-end advanced machine learning in the cloud with Microsoft Azure Machine Learning

Product type Paperback

Published in Apr 2020

Publisher Packt

ISBN-13 9781789807554

Length 436 pages

Edition 1st Edition

Languages

Tools

Azure

Concepts

Machine Learning

Authors (2):

Christoph Körner

Kaijisse Waaijer

View More author details

Table of Contents (20) Chapters

Preface

About Mastering Azure Machine Learning

Section 1: Azure Machine Learning

1. Building an end-to-end machine learning pipeline in Azure FREE CHAPTER

2. Choosing a machine learning service in Azure

Section 2: Experimentation and Data Preparation

3. Data experimentation and visualization using Azure

4. ETL, data preparation, and feature extraction

5. Azure Machine Learning pipelines

6. Advanced feature extraction with NLP

Section 3: Training Machine Learning Models

7. Building ML models using Azure Machine Learning

8. Training deep neural networks on Azure

9. Hyperparameter tuning and Automated Machine Learning

10. Distributed machine learning on Azure

11. Building a recommendation engine in Azure

Section 4: Optimization and Deployment of Machine Learning Models

12. Deploying and operating machine learning models

13. MLOps—DevOps for machine learning

14. What's next?

Index

Building a simple bag-of-words model

In this section, we will look at a surprisingly simple concept to tackle the shortcomings of label encoding for textual data with the bag-of-words concept, which will build a foundation for a simple NLP pipeline. Don't worry if these techniques look too simple when you read through it; we will gradually build on top of them with tweaks, optimizations, and improvements to build a modern NLP pipeline.

A naive bag-of-words model using counting

The main concept that we will build in this section is the bag-of-words model. It is a very simple concept; that is, modeling any document as a collection of words that appear in a given document with the frequency of each word. Hence, we throw away sentence structure, word order, punctuation, and so on and reduce the documents to a raw count of words. We can then vectorize this word count into a numeric vector representation, which can then be used for ML, analysis, document comparisons, and much...