Getting Started in PySpark
In the previous chapters, we discussed that Spark primarily uses four languages, which are Scala, Python, R, and SQL. When any of these languages are used, the underlying execution engine is the same. This provides the necessary unification we talked about in Chapter 2. This means that developers can use any language of their choice and can also switch between different APIs in applications.
For the context of this book, we’re going to focus on Python as the primary language. Spark used with Python is called PySpark.
Let’s get started with the installation of Spark.
Installing Spark
To get started with Spark, you would have to first install it on your computer. There are a few ways to install Spark. We will focus on just one in this section.
PySpark provides pip installation from PyPI. You can install it as follows:
pip install pyspark
Once Spark is installed, you will need to create a Spark session.