Learning the basics of data storage
As stated earlier, the data storage step in the model pipeline process tends to be a function of machine learning/data engineers. However, it is beneficial for a data scientist to have a basic understanding of this step.
Data storage is simply about housing the data that we gather from different sources. There are a variety of approaches to this, depending on the data’s requirements (e.g., the structure, schema, size, ingestion type, privacy, etc.).
The following are some examples of data storage options within MLOps:
- Binary Large Object (BLOB) storage: BLOB storage is a type of data storage that is designed to store and manage large binary data, such as images, videos, documents, and other types of files. BLOBs can be of varying sizes, from small to very large, and they are typically unstructured data, meaning they lack a specific schema or organization. In modern data architectures, the cloud services offered by Azure Blob...