Creating a data extraction pipeline using Python
With a bit more familiarity around where to source data, let’s put it in the context of an importation activity within a data pipeline workflow. We’re going to use a Jupyter notebook for prototyping the final methodology we will eventually deploy within a Python script. The reasoning behind this is simple: Jupyter notebooks allow easy visualization, but can be quite clunky to deploy; Python scripts have less visualization access (it can be done, but not as effortlessly as in Jupyter) but can easily be used for deployment and various environments. In our case, we want to properly test and “sanity-check” the format of the imported source data. Later in the book, we’ll show how, when we transcribe our code to a Python script, we gain access to PyCharm’s powerful environment to easily test, log, and encrypt Python scripts.
Data extraction
Within your PyCharm environment for Chapter 4, verify...