Creating Azure Machine Learning data assets
Once the previous datastore is created, the next step is to create a data asset. Please note that we will be using the terms “data asset” and “dataset” interchangeably throughout the chapter. A dataset is a logical connection to the datastore with versioning and schema management, such as choosing which columns of the data to use, the types of the columns in the dataset, and some statistics about the data. Data assets abstract the code from configuring data to be read. Also, data assets are very useful when we run multiple models as each model can be configured to read the dataset name instead of configuring or programming how to connect to the dataset and read it. This makes it easier to scale the model training.
In the following sections, you will learn how to create datasets using the Azure Machine Learning Python SDK, CLI, and UI. Datasets allow us to create versions based on schema changes without changing...