Classification
Classification is very similar to linear regression. The algorithms take vectors, and the algorithm object has various parameters to tweak the algorithm in order to fit the needs of an application. The returned model can be used to predict the class invoking the transform method. We will use the Titanic Dataset and predict who will survive. The Dataset has 15 fields, including age, gender, whether they have siblings/a spouse, parents sailing with them, the class they are in, and so forth.
Loading data
Similar to regression, we load the CSV data using the read.csv()
method. The code file is ML02v2.scala
. We load the code and run the ML02v2
object. The CSV data is loaded and we print the schema to verify:
val filePath = "/Users/ksankar/fdps-v3/" val passengers = spark.read.option("header","true"). option("inferSchema","true"). csv(filePath + "data/titanic3_02.csv") println("Passengers has "+passengers.count()+" rows") passengers.show...