Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon

PyTorch-based HyperLearn Statsmodels aims to implement a faster and leaner GPU Sklearn

Save for later
  • 3 min read
  • 04 Sep 2018

article-image
HyperLearn is a Statsmodel, a result of the collaboration of languages such as PyTorch, NoGil Numba, Numpy, Pandas, Scipy & LAPACK, and has similarities to Scikit Learn.

This project started last month by Daniel Hanchen and still has some unstable packages. He aims to make Linear Regression, Ridge, PCA, LDA/QDA faster, which then flows onto other algorithms being faster.

This Statsmodels combo incorporates novel algorithms to make it 50% more faster and enables it to use 50% lesser RAM along with a leaner GPU Sklearn.

HyperLearn also has an embedded statistical inference measures, and can be called similar to a Scikit Learn's syntax (model.confidence_interval_)

HyperLearn’s Speed/ Memory comparison


There is a  50%+ improvement on Quadratic Discriminant Analysis (similar improvements for other models) as can be seen below:

pytorch-based-hyperlearn-statsmodels-aims-to-implement-a-faster-and-leaner-gpu-sklearn-img-0Source: GitHub


Time(s) is Fit + Predict. RAM(mb) = max( RAM(Fit), RAM(Predict) )

Key Methodologies and Aims of the HyperLearn project

#1 Parallel For Loops

  • Hyperlearn for loops will include Memory Sharing and Memory Management
  • CUDA Parallelism will be made possible through PyTorch & Numba

#2 50%+ faster and leaner

  • Matrix operations that have been improved include  Matrix Multiplication Ordering, Element Wise Matrix Multiplication reducing complexity to O(n^2) from O(n^3), reducing Matrix Operations to Einstein Notation and Evaluating one-time Matrix Operations in succession to reduce RAM overhead.
  • Applying QR Decomposition and then SVD(Singular Value decomposition) might be faster in some cases.
  • Utilise the structure of the matrix to compute faster inverse
  • Computing SVD(X) and then getting pinv(X) is sometimes faster than pure pinv(X)

#3 Statsmodels is sometimes slow

  • Confidence, Prediction Intervals, Hypothesis Tests & Goodness of Fit tests for linear models are optimized.
  • Using Einstein Notation & Hadamard Products where possible.
  • Computing only what is necessary to compute (Diagonal of matrix only)
  • Fixing the flaws of Statsmodels on notation, speed, memory issues and storage of variables.
  • Unlock access to the largest independent learning library in Tech for FREE!
    Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
    Renews at $19.99/month. Cancel anytime

#4 Deep Learning Drop In Modules with PyTorch

  • Using PyTorch to create Scikit-Learn like drop in replacements.

#5 20%+ Less Code along with Cleaner Clearer Code

  • Using Decorators & Functions wherever possible.
  • Intuitive Middle Level Function names like (isTensor, isIterable).
  • Handles Parallelism easily through hyperlearn.multiprocessing

#6 Accessing Old and Exciting New Algorithms

  • Matrix Completion algorithms - Non Negative Least Squares, NNMF
  • Batch Similarity Latent Dirichelt Allocation (BS-LDA)
  • Correlation Regression and many more!


       
Daniel further went on to publish some prelim algorithm timing results on a range of algos from MKL Scipy, PyTorch, MKL Numpy, HyperLearn's methods + Numba JIT compiled algorithms

Here are his key findings on the HyperLearn statsmodel:

  1. HyperLearn's Pseudoinverse has no speed improvement
  2. HyperLearn's PCA will have over 200% improvement in speed boost.
  3. HyperLearn's Linear Solvers will be over 1 times faster i.e  it will show a 100% improvement in speed


You can find all the details of the test on reddit.com

For more insights on HyperLearn, check out the release notes on Github.

A new geometric deep learning extension library for PyTorch releases!

NVIDIA leads the AI hardware race. But which of its GPUs should you use for deep learning?

Introduction to Sklearn