Creating a data asset using the Python SDK
In this section, we will show you how to create a data asset using the Python SDK. As mentioned in the previous section, you can create data from datastores, local files, and public URLs. The Python script to create a data asset from a local file (for example, titanic.csv
) is shown in Figure 2.19.
Please note that in the following code snippet, type = AssetTypes.mltable
abstracts the schema definition for the tabular data, making it easier to share datasets:
Figure 2.19 – Creating a data asset via the Python SDK
Inside the my_data
folder, there are two files:
- The actual data file, which in this case is
titanic.csv
- The
mltable
file, which is a YAML file specifying the data’s schema so that themltable
engine can use it in order to materialize the data into an in-memory object such as pandas or DASK
Figure 2.20 shows the mltable
YAML file for this example:
...