7. Introduction to Analytics Engine (Spark) for Big Data
Activity 7.01: Exploring and Processing a Movie Locations Database by Using Spark's Transformations and Actions
Solution
- The first step involves logging in to the
COMMUNITY EDITION
of Databricks. - Upload the file you have downloaded,
Film_Locations_in_San_Francisco.csv
, into Databricks: - Read the CSV file to a DataFrame:
from pyspark.sql.functions import desc # File location and type file_location = "/FileStore/tables/Film_Locations_in_San_Francisco.csv" file_type = "csv" # The applied options are for CSV files. For other file types, these will be ignored. dataTable = spark.read.format(file_type) \ .option("inferSchema", "true") \ .option("header", "true") \ .option("sep", ",") \ .load(file_location) display(dataTable) dataTable.printSchema(...