To better understand the life cycle of the Spark application, let's create a sample application and understand the execution step by step. The following example shows the content of the data file that we will use in our application. The sale.csv file stores information,such as PRODUCT_CODE, COUNTRY_CODE, and the order AMOUNT for each ORDER_ID:
$ cat sale.csv
ORDER_ID,PRODUCT_CODE,COUNTRY_CODE,AMOUNT
1,PC_01,USA,200.00
2,PC_01,USA,46.34
3,PC_04,USA,123.54
4,PC_02,IND,99.76
5,PC_44,IND,245.00
6,PC_02,AUS,654.21
7,PC_03,USA,75.00
8,PC_01,SPN,355.00
9,PC_03,USA,34.02
10,PC_03,USA,567.07
We shall now create a sample application using Python API to find out the total sales amount by country and sort them in descending order by the total amount. The following example shows the code of our sample application:
from pyspark.sql import SparkSession
from pyspark.sql.functions...