The Microsoft Machine Learning Library for Apache Spark (MMLSpark) assists in provisioning scalable machine learning models for large datasets, especially for building deep learning problems. MMLSpark works with SparkML pipelines, including Microsoft CNTK and the OpenCV library, which provide end-to-end support for the ingress and processing of image input data, categorization of images, and text analytics using pre-trained deep learning algorithms. They also train and retrieve scores from classification and regression models by applying featurization.
An overview of the Microsoft Machine Learning Library for Apache Spark (MMLSpark)
Environment setup for MMLSpark
The following prerequisites are mandatory for setting up MMLSpark...