Scripts versus notebooks in data science
In the preceding data science pipeline, there are two main sections: data cleaning, where we remove inconsistent data, fill in missing data, and appropriately encode the attributes, and data analysis, where we generate visualizations and insights from our cleaned dataset.
The data cleaning process was implemented by a Python script while the data analysis process was done with a Jupyter notebook. In general, deciding whether a Python program should be done in a script or a notebook is quite an important, yet often overlooked, aspect while working on a data science project.
As we discussed in the previous chapter, Jupyter notebooks are perfect for iterative development processes, where we can transform and manipulate our data as we go. A Python script, on the other hand, offers no such dynamism. We need to enter all of the code necessary in the script and run it as a complete program.
However, as illustrated in the Data cleansing and...