So, in the preceding data science pipeline we just went through, there are two main sections—data cleaning (where we remove inconsistent data, fill in missing data, and appropriately encode the attributes) and data analysis (where we generate visualizations and insights from our cleaned dataset).
The data cleaning process was implemented by a Python script while the data analysis process was done with a Jupyter notebook. In general, deciding whether a Python program should be done in a script or in a notebook is quite an important, yet often overlooked aspect, while working on a data science project.
As we have discussed in the previous chapter, Jupyter notebooks are perfect for iterative development processes, where we can transform and manipulate our data as we go. A Python script, on the other hand, offers no such dynamism—...