Introducing Hivemall
Hivemall is a scalable machine learning library built on top of Apache Hive and Hadoop. It is a collection of machine learning algorithms that are created as User Defined Functions (UDFs) and User Defined Table Functions (UDTFs). Hivemall offers the following benefits:
Easy to use: Existing users of Hive can implement machine learning algorithms using the well-known Hive QL language. There is no need to compile programs and create executable jars as in MLlib or H2O. Just add UDFs or UDTFs and execute Hive queries.
Scalability: It provides the scalability benefits of Hadoop and Hive with additional features to provide scalability to any number of training and testing instances and also any number of features.
It offers a variety of algorithms including Classification, Regression, K-Means, Recommendation, Anomaly Detection, and Feature engineering.
Follow this procedure to get started:
Download the compatible JAR and functions from https://github.com/myui/hivemall/releases...