H2O
Before we deep dive into the examples, let's spend some time justifying our decision of using H2O as our deep learning framework for anomaly detection.
H2O is not just a library or package to install. It is an open source, rich analytics platform that provides both machine learning algorithms and high-performance parallel computing abstractions.
H2O core technology is built around a Java Virtual Machine optimized for in-memory processing of distributed data collections.
The platform is usable via a web-based UI or programmatically in many languages, such as Python, R, Java, Scala, and JSON in a REST API.
Data can be loaded from many common data sources, such as HDFS, S3, most of the popular RDBMSes, and a few other NoSQL databases.
After loading, data is represented in an H2OFrame
, making it familiar to people used to working with R, Spark, and Python pandas data frames.
The backend can then be switched among different engines. It can run locally in your machine or it can be deployed in a...