Dataset versioning – beyond Weights & Biases, MLflow, and DVC
Throughout this chapter, we have seen how datasets can be managed by DL project-tracking tools. In the case of W&B, we can use artifacts, while in the case of MLflow and DVC, DVC runs on top of a Git repository to track different versions of datasets, thereby solving the limitations of Git.
Are there any other methods and/or tools that are useful for dataset versioning? The simple answer is yes, but again, the more precise answer depends on the context. To make the right choice, you must consider various aspects including cost, ease of use, and integration difficulty. In this section, we will mention a few tools that we believe are worth exploring if dataset versioning is one of the critical components of your project:
- Neptune (https://docs.neptune.ai) is a metadata store for MLOps. Neptune artifacts allow versioning to be conducted on datasets that are stored locally or in cloud.
- Delta Lake...