Summary
In this chapter, we have learned how to maintain code reproducibility from a data science perspective through structured standards and practices to avoid duplicate work using the Jupyter notebook.
We started by gaining an understanding of what reproducibility is and how it impacts research and data science work. We looked into areas where we can improve code reproducibility, particularly looking at how we can maintain effective coding standards in terms of data reproducibility. Following that, we looked at important coding standards and practices to avoid duplicate work using the effective management of code through the segmentation of workflows, by developing functions for all key tasks, and how we can generalize coding to create libraries and packages from a reusability standpoint.
In the next chapter, we will learn how to use all the functionalities we have learned about so far to generate a full analysis report. We will also learn how to use various PySpark functionalities for...