Exploring file operations with pandas.DataFrames
pandas supports the persistence of DataFrames in both plain-text and binary formats. The common text formats are CSV and JSON files, the most used binary formats are Excel XLSX, HDF5, and pickle.
In this book, we focus on plain-text persistence.
CSV files
CSV files (comma-separated values files) are data-exchange standard files.
Writing CSV files
Writing a pandas DataFrame to a CSV file is easily achievable using the pandas.DataFrame.to_csv(...)
method. The header=
parameter controls whether a header is written to the top of the file or not and the index=
parameter controls whether the Index axis values are written to the file or not:
df.to_csv('df.csv', sep=',', header=True, index=True)
We can inspect the file written to disk using the following Linux command typed into the notebook. The !
character instructs the notebook to run a shell command:
!head -n 4 df.csv
The file contains the following...