Experimentation lies at the core of data science. Data scientists perform many experiments to find the best approach to solving the task at hand. In general, experiments exist in sets that are tied to data processing pipeline steps.
For example, your project may comprise the following experiment sets:
- Feature engendering experiments
- Experiments with different machine learning algorithms
- Hyperparameter optimization experiments
Each experiment can affect the results of other experiments, so it is crucial to be able to reproduce each experiment in isolation. It is also important to track all results so your team can compare pipeline variants and choose the best one for your project according to the metric values.
A simple spreadsheet file with links to data files and code versions can be used to track all experiments, but reproducing experiments will require...