Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

PyTorch-based HyperLearn Statsmodels aims to implement a faster and leaner GPU Sklearn

Save for later
  • 3 min read
  • 04 Sep 2018

article-image
HyperLearn is a Statsmodel, a result of the collaboration of languages such as PyTorch, NoGil Numba, Numpy, Pandas, Scipy & LAPACK, and has similarities to Scikit Learn.

This project started last month by Daniel Hanchen and still has some unstable packages. He aims to make Linear Regression, Ridge, PCA, LDA/QDA faster, which then flows onto other algorithms being faster.

This Statsmodels combo incorporates novel algorithms to make it 50% more faster and enables it to use 50% lesser RAM along with a leaner GPU Sklearn.

HyperLearn also has an embedded statistical inference measures, and can be called similar to a Scikit Learn's syntax (model.confidence_interval_)

HyperLearn’s Speed/ Memory comparison


There is a  50%+ improvement on Quadratic Discriminant Analysis (similar improvements for other models) as can be seen below:

pytorch-based-hyperlearn-statsmodels-aims-to-implement-a-faster-and-leaner-gpu-sklearn-img-0Source: GitHub


Time(s) is Fit + Predict. RAM(mb) = max( RAM(Fit), RAM(Predict) )

Key Methodologies and Aims of the HyperLearn project

#1 Parallel For Loops

  • Hyperlearn for loops will include Memory Sharing and Memory Management
  • CUDA Parallelism will be made possible through PyTorch & Numba

#2 50%+ faster and leaner

  • Matrix operations that have been improved include  Matrix Multiplication Ordering, Element Wise Matrix Multiplication reducing complexity to O(n^2) from O(n^3), reducing Matrix Operations to Einstein Notation and Evaluating one-time Matrix Operations in succession to reduce RAM overhead.
  • Applying QR Decomposition and then SVD(Singular Value decomposition) might be faster in some cases.
  • Utilise the structure of the matrix to compute faster inverse
  • Computing SVD(X) and then getting pinv(X) is sometimes faster than pure pinv(X)

#3 Statsmodels is sometimes slow

  • Confidence, Prediction Intervals, Hypothesis Tests & Goodness of Fit tests for linear models are optimized.
  • Using Einstein Notation & Hadamard Products where possible.
  • Computing only what is necessary to compute (Diagonal of matrix only)
  • Fixing the flaws of Statsmodels on notation, speed, memory issues and storage of variables.
  • Unlock access to the largest independent learning library in Tech for FREE!
    Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
    Renews at $19.99/month. Cancel anytime

#4 Deep Learning Drop In Modules with PyTorch

  • Using PyTorch to create Scikit-Learn like drop in replacements.

#5 20%+ Less Code along with Cleaner Clearer Code

  • Using Decorators & Functions wherever possible.
  • Intuitive Middle Level Function names like (isTensor, isIterable).
  • Handles Parallelism easily through hyperlearn.multiprocessing

#6 Accessing Old and Exciting New Algorithms

  • Matrix Completion algorithms - Non Negative Least Squares, NNMF
  • Batch Similarity Latent Dirichelt Allocation (BS-LDA)
  • Correlation Regression and many more!


       
Daniel further went on to publish some prelim algorithm timing results on a range of algos from MKL Scipy, PyTorch, MKL Numpy, HyperLearn's methods + Numba JIT compiled algorithms

Here are his key findings on the HyperLearn statsmodel:

  1. HyperLearn's Pseudoinverse has no speed improvement
  2. HyperLearn's PCA will have over 200% improvement in speed boost.
  3. HyperLearn's Linear Solvers will be over 1 times faster i.e  it will show a 100% improvement in speed


You can find all the details of the test on reddit.com

For more insights on HyperLearn, check out the release notes on Github.

A new geometric deep learning extension library for PyTorch releases!

NVIDIA leads the AI hardware race. But which of its GPUs should you use for deep learning?

Introduction to Sklearn