Solution
The solution to Problem 1 is as follows:
- Here, we are dropping any residual database, then creating a DataFrame and writing the DataFrame as a table:
spark.sql(f"DROP DATABASE IF EXISTS {database_name} CASCADE;")
- Now, we must import our libraries:
from pyspark.sql.types import StructField, DateType, StringType, FloatType, StructType
- Next, we will create our database. We are defining the location of the database; all tables will be in that location:
database_name = "chapter_2_lab"
spark.sql(f" CREATE DATABASE IF NOT EXISTS {database_name} LOCATION 'dbfs:/tmp/accounting_alpha' ;")
- Now, we can write our table. First, we will define our table’s name and the schema of the table:
table_name = "returns_bronze"
schema = StructType([StructField("Date", DateType(), True),
StructField...