Summary
In this chapter, we walked through the hands-on process of working on a data science pipeline. First, we discussed the importance of having version control for not just our code and project-related files but also our datasets; we then learned how to use Git LFS to apply version control to large files and datasets.
Next, we looked at various data cleaning and preprocessing techniques that are specific to the example dataset. Using the SciView panel in PyCharm, we can dynamically inspect the current state of our data and variables and see how they change after each command.
Finally, we considered several techniques to generate visualizations and extract insights from our dataset. Using the Jupyter editor in PyCharm, we were able to avoid working with a Jupyter server and work on our notebook entirely within PyCharm. Having walked through this process, you are now ready to tackle real-life data science problems and projects using the same tools and functionalities that we...