Creating a DataFrame from CSV
In this recipe, we'll look at how to create a new DataFrame from a delimiter-separated values file.
Note
The code for this recipe can be found at https://github.com/arunma/ScalaDataAnalysisCookbook/blob/master/chapter1-spark-csv/src/main/scala/com/packt/scaladata/spark/csv/DataFrameCSV.scala.
How to do it...
This recipe involves four steps:
- Add the
spark-csv
support to our project. - Create a Spark Config object that gives information on the environment that we are running Spark in.
- Create a Spark context that serves as an entry point into Spark. Then, we proceed to create an
SQLContext
from the Spark context. - Load the CSV using the
SQLContext
. - CSV support isn't first-class in Spark, but it is available through an external library from Databricks. So, let's go ahead and add that to our
build.sbt
.After adding the
spark-csv
dependency, our completebuild.sbt
looks like this:organization := "com.packt" name := "chapter1-spark-csv" scalaVersion...