Data ingestion and feature engineering
Data is essential to train ML models; without data, there is no ML. Data ingestion is a trigger step for the ML pipeline. It deals with the volume, velocity, veracity, and variety of data by extracting data from various data sources and ingesting the needed data for model training.
The ML pipeline is initiated by ingesting the right data for training the ML models. We will start by accessing the preprocessed data we registered in the previous chapter. Follow these steps to access and import the preprocessed data and get it ready for ML training:
- Using the
Workspace()
function from the Azure ML SDK, access the data from the datastore in the ML workspace as follows:from azureml.core import Workspace, Dataset subscription_id = 'xxxxxx-xxxxxx-xxxxxxx-xxxxxxx' resource_group = 'Learn_MLOps' workspace_name = 'MLOps_WS' workspace = Workspace(subscription_id, resource_group, workspace_name)
Note
Insert your own...