Chapter 1: The Problem of Data Reproducibility
Today, machine learning algorithms are used everywhere. They are integrated into our day-to-day lives, and we use them without noticing. While we are rushing to work, planning a vacation, or visiting a doctor's office, the models are at work, at times making important decisions about us. If we are unsure what the model is doing and how it makes decisions, how can we be sure that its decisions are fair and just? Pachyderm profoundly cares about the reproducibility of data science experiments and puts data lineage, reproducibility, and version control at its core. But before we proceed, let's discuss why reproducibility is so important.
This chapter explains the concepts of reproducibility, ethical Artificial Intelligence (AI), and Machine Learning Operations (MLOps), as well as providing an overview of the existing data science platforms and how they compare to Pachyderm.
In this chapter, we're going to cover the following main topics:
- Why is reproducibility important?
- The reproducibility crisis in science
- Demystifying MLOps
- Types of data science platforms
- Explaining ethical AI