Data structuring
Now we have to come to the part where we need to restructure the data into a usable format from its raw format. In our use case, we extracted data in JSON format and it is good for exploratory analysis that we used a raw data format. When we move further into the data-wrangling pipeline, different file formats and structures would be more efficient.
Different file formats and when to use them
There are different file formats that are commonly used in data pipelines:
- Readable file formats: CSV, JSON, and Extensible Markup Language (XML) are some file formats that are readable by human users:
- CSV files are used mostly in the data extraction phase when the data needs to be shared with analysts for reading and performing further actions. The advantage is you don’t need any programming language to read the files and can be opened in the most commonly available text editors. These file formats are widely popular earlier in the data analytics community...