In this chapter, we have walked through the hands-on process of working on a data science pipeline. First, we discussed the importance of having version control for not just our code and project-related files but also our datasets; we then learned how to use Git LFS to apply version control to large files and datasets.
Next, we looked at various data cleaning and pre-processing techniques that are specific to the example dataset. Using the SciView panel in PyCharm, we can dynamically inspect the current state of our data and variables and see how they change after each command.
Finally, we considered several techniques to generate visualizations and extract insights from our dataset. Using the Jupyter editor in PyCharm, we were able to avoid working with a Jupyter server and work on our notebook entirely within PyCharm. Having walked through this process, you are now ready...