Physical format considerations
The Python library offers us a number of modules to help process common physical file formats. Chapter 13, File Formats, of the Python Standard Library describes file compression and archiving; this includes modules to handle files compressed using zip or BZip2. Chapter 14, Cryptographic Services describes modules which handle file formats such as CSV, configuration files, and PLIST files. Chapter 19, Structured Markup Processing Tools describes Internet data handling, which includes the JSON file format. Chapter 20, Internet Protocols and Support describes modules to handle markup languages such as HTML and XML. For modules that are not part of the standard library, the Python Package Index (PyPI) may have a package that handles the file format. See http://pypi.python.org.
We'll look quickly at the CSV module because it is often used when working on "big data" problems. For example, the Apache Hadoop software library—a framework that allows for the distributed...