In this section, we will present a machine learning example for textual analysis. Refer to Chapter 6, Using Spark SQL in Machine Learning Applications, for more details about the machine learning code presented in this section.
The Dataset used in the following example contains 1,080 documents of free text business descriptions of Brazilian companies categorized into a subset of nine categories. You can download this Dataset from https://archive.ics.uci.edu/ml/datasets/CNAE-9.
scala> val inRDD = spark.sparkContext.textFile("file:///Users/aurobindosarkar/Downloads/CNAE-9.data")
scala> val rowRDD = inRDD.map(_.split(",")).map(attributes => Row(attributes(0).toDouble, attributes(1).toDouble, attributes(2).toDouble, attributes(3).toDouble, attributes(4).toDouble, attributes(5).toDouble,
.
.
.
attributes...