The binary .npy and pickle formats
Saving data in the CSV format is fine most of the time. It is easy to exchange CSV files, since most programming languages and applications can handle this format. However, it is not very efficient; CSV and other plaintext formats take up a lot of space. Numerous file formats have been invented that offer a high level of compression, such as .zip
, .bzip
, and .gzip
.
The following is the complete code for this storage comparison exercise, which can also be found in the ch-05.ipynb
file of this book's code bundle:
import numpy as np import pandas as pd from tempfile import NamedTemporaryFile from os.path import getsize np.random.seed(42) a = np.random.randn(365, 4) tmpf = NamedTemporaryFile() np.savetxt(tmpf, a, delimiter=',') print("Size CSV file", getsize(tmpf.name)) tmpf = NamedTemporaryFile() np.save(tmpf, a) tmpf.seek(0) loaded = np.load(tmpf) print("Shape...