"I never guess. It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts."
- Sir Arthur Conan Doyle
In this chapter, we will begin to perform some exploratory data analysis on the Spark dataframe we created in the previous chapter. We will learn about some specific Spark commands that will assist you in your analysis, and will discuss several ways to perform graphing and plotting.
As you go through these examples, remember that data that resides in Spark may be much larger than you are used to, and that it may be impractical to apply some quick analytic techniques without first considering how the data is organized, and how performance will be affecting using standard techniques.
If you are picking up where you left off in the previous chapter...