In this recipe, we examine several types of SparseVector creation. As the length of the vector increases (millions) and the density remains low (few non-zero members), then sparse representation becomes more and more advantageous over the DenseVector.
Creating SparseVector and setup with Spark
How to do it...
- Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
- Import the necessary packages for vector and matrix manipulation:
import org.apache.spark.sql.{SparkSession}
import org.apache.spark.mllib.linalg._
import breeze.linalg.{DenseVector => BreezeVector}
import Array._
import org.apache.spark.mllib.linalg.SparseVector
- Set up the Spark context and application...