Creating SparseVector and setup with Spark
In this recipe, we several types of SparseVector
creation. As the length of the vector increases (millions) and the density remains low (few non-zero members), then sparse representation more and more advantageous over the DenseVector
.
How to do it...
- Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
- Import the necessary packages for vector and matrix manipulation:
import org.apache.spark.sql.{SparkSession} import org.apache.spark.mllib.linalg._ import breeze.linalg.{DenseVector => BreezeVector} import Array._ import org.apache.spark.mllib.linalg.SparseVector
- Set up the Spark context and application parameters so Spark can run. See the first recipe in this chapter for more details and variations:
val spark = SparkSession .builder .master("local[*]") .appName("myVectorMatrix") .config("spark.sql.warehouse.dir", ".") .getOrCreate()
- Here we look at creating a ML SparseVector that corresponds...