Reading CSV and JSON Files and Solving Problems
When working with data, we come across several different types of data, such as structured, semi-structured, and non-structured, and some specifics from other systems’ outputs. Yet two widespread file types are ingested, comma-separated values (CSV) and JavaScript Object Notation (JSON). There are many applications for these two files, which are widely used for data ingestion due to their versatility.
In this chapter, you will learn more about these file formats and how to ingest them using Python and PySpark, apply the best practices, and solve ingestion and transformation-related problems.
In this chapter, we will cover the following recipes:
- Reading a CSV file
- Reading a JSON file
- Creating a SparkSession for PySpark
- Using PySpark to read CSV files
- Using PySpark to read JSON files