Reading delimited files with the CSVĀ module
One commonly used data format is CSV. We can easily generalize this to think of the comma as simply one of many candidate separator characters. We might have a CSV file that uses the |
character as the separator between columns of data. This generalization makes CSV files particularly powerful.
How can we process data in one of the wide varieties of CSV formatting?
Getting ready
A summary of a file's content is called a schema. It's essential to distinguish between two aspects of the schema:
- The Physical Format of the file: For CSV, this means the file contains text. The text is organized into rows and columns. There will be a row separator character (or characters); there will also be a column separator character. Many spreadsheet products will use
,
as the column separator and the\r\n
sequence of characters as the row separator. Other formats are possible, though, and it's easy to change the punctuation that separates columns and rows. The specific...