Comparing the NumPy .npy binary format and pickling pandas DataFrames
Saving data in the CSV format is fine most of the time. It is easy to exchange CSV files, since most programming languages and applications can handle this format. However, it is not very efficient; CSV and other plaintext formats take up a lot of space. Numerous file formats have been invented, which offer a high level of compression such as zip, bzip, and gzip.
The following is the complete code for this storage comparison exercise, which can also be found in the binary_formats.py
file of this book's code bundle:
import numpy as np import pandas as pd from tempfile import NamedTemporaryFile from os.path import getsize np.random.seed(42) a = np.random.randn(365, 4) tmpf = NamedTemporaryFile() np.savetxt(tmpf, a, delimiter=',') print "Size CSV file", getsize(tmpf.name) tmpf = NamedTemporaryFile() np.save(tmpf, a) tmpf.seek(0) loaded = np.load(tmpf) print "Shape", loaded.shape print "Size .npy file", getsize(tmpf.name...