Chapter 5. Retrieving, Processing, and Storing Data
Data can be found everywhere in all shapes and forms. We can get it from the Web, by e-mail and FTP, or create it ourselves in a lab experiment or marketing poll. An exhaustive overview of how to acquire data in various formats will require many more pages than what we have available. Sometimes, we need to store data before we can analyze it or after we are done with our analysis. We will also discuss storing data in this chapter. Chapter 8, Working with Databases, gives information about various databases (relational and NoSQL) and related APIs. The following is a list of the topics that we are going to cover in this chapter:
- Writing CSV files with NumPy and pandas
- The binary
.npy
and pickle formats - Reading and writing to Excel with pandas
- JSON
- REST web services
- Parsing RSS feeds
- Scraping the Web
- Parsing HTML
- Storing data with PyTables
- HDF5 pandas I/O