Importing data from fixed-width data files
Log files from events and time series data files are common sources for data visualizations. Sometimes, we can read them using CSV dialect for tab-separated data, but sometimes they are not separated by any specific character. Instead, fields are of fixed widths and we can infer the format to match and extract data.
One way to approach this is to read a file line by line and then use string manipulation functions to split a string into separate parts. This approach seems straightforward, and if performance is not an issue, it should be tried first.
If performance is more important or the file to parse is large (hundreds of megabytes), using the Python module struct
(http://docs.python.org/library/struct.html) can speed us up as the module is implemented in C rather than in Python.
Getting ready
As the module struct
is part of the Python Standard Library, we don't need to install any additional software to implement this recipe.
How to do it...
We...