11.4 Summary
In this chapter, we looked at two important parts of the data acquisition pipeline:
File formats and data persistence
The architecture of applications
There are many file formats available for Python data. It seems like newline delimited (ND) JSON is, perhaps, the best way to handle large files of complex records. It fits well with Pydantic’s capabilities, and the data can be processed readily by Jupyter Notebook applications.
The capability to retry a failed operation without losing existing data can be helpful when working with large data extractions and slow processing. It can be very helpful to be able to re-run the data acquisition without having to wait while previously processed data is processed again.