Inserting formatted SparkSession logs to facilitate your work
A commonly underestimated best practice is how to create valuable logs. Applications that log information and small code files can save a significant amount of debugging time. This is also true when ingesting or processing data.
This recipe approaches the best practice of logging events in our PySpark scripts. The examples here will give a more generic overview, which can be applied to any other piece of code and will even be used later in this book.
Getting ready
We will use the listings.csv
file to execute the read
method from Spark. You can find this dataset inside the GitHub repository for this book. Make sure your SparkSession
is up and running.
How to do it…
Here are the steps to perform this recipe:
- Setting the log level: Now, using
sparkContext
, we will assign the log level:spark.sparkContext.setLogLevel("INFO")
- Instantiating the log4j logger: The next step is to create...