When data scientists work with data in an Apache Spark environment, they typically work with either RDDs or DataFrames. In our examples so far, the data may be stored in the RDD format, and it is fed into the model by building a predictive feed into the model.
In these exercises, the Spark library is called spark.mllib. The MLlib library is the original ML library that comes with Spark. The newer library is called Spark ML.