Apache SystemML in action
So let's take a look at a very simple example. Let's create a script in Apache SystemML DSL--an R-like syntax--in order to multiply two matrices:
import org.apache.sysml.api.MLOutput import org.apache.spark.sql.SQLContext import org.apache.spark.mllib.util.LinearDataGenerator import org.apache.sysml.api.MLContext import org.apache.sysml.runtime.instructions.spark.utils.{RDDConverterUtilsExt => RDDConverterUtils} import org.apache.sysml.runtime.matrix.MatrixCharacteristics; val sqlContext = new SQLContext(sc) val simpleScript = """ fileX = ""; fileY = ""; fileZ = ""; X = read (fileX); Y = read (fileY); Z = X %*% Y write (Z,fileZ); """
Then, we generate some test data:
// Generate data val rawDataX = sqlContext.createDataFrame(LinearDataGenerator.generateLinearRDD(sc, 100, 10, 1)) val rawDataY = sqlContext.createDataFrame(LinearDataGenerator.generateLinearRDD(sc, 10, 100, 1)) // Repartition into a more parallelism-friendly number of partitions val dataX = rawDataX...