Summary
You've learned that there is room for additional machine learning frameworks and libraries on top of Apache Spark and that a cost-based optimizer similar to what we are already using in Catalyst can speed things up tremendously. In addition, separation from performance optimizations code and code for the algorithm facilitates further improvements on the algorithm side without having to care about performance at all.
Additionally, these execution plans are highly adaptable to the size of the data and also to the available hardware configuration based on main memory size and potential accelerators such as GPUs. Apache SystemML dramatically improves on the life cycle of machine learning applications, especially if machine learning algorithms are not used out of the box, but an experienced data scientists works on low level details on it in a mathematical or statistical programming language.
In Apache SystemML, this low level, mathematical code can be used out of the box, without any manual...