Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

GitHub Octoverse: top machine learning packages, languages, and projects of 2018

Save for later
  • 2 min read
  • 28 Jan 2019

article-image

The top tools and languages used in machine learning for 2018 were revealed in the GitHub The State of the Octoverse: Machine Learning. The general observation showed TensorFlow being one of the projects with the most number of contributions which is not surprising considering its age and popularity. Python was in the second place of the most popular languages on GitHub after JavaScript and Java.

The data on contributions of whole of 2018 led to some insights. Contributions are pushing code, pull requests, opening an issue, commenting, or any other related activities. Data consists of all public repositories and any private repositories that have opted in for the dependency graph.

Top languages used for machine learning on GitHub


The primary language used in a repository tagged with machine-learning is considered to rank the languages. Python is at the top followed by C++. Java makes it to the top 5 with JavaScript. What’s interesting is the growth of Julia which has bagged the sixth spot considering that it is a relatively new language. R, popular for data analytics tasks also shows up thanks to its wide range of libraries for many tasks.

  1. Python
  2. C++
  3. JavaScript
  4. Java
  5. C#
  6. Julia
  7. Shell
  8. R
  9. TypeScript
  10. Scala

Top machine learning and data science packages


Projects tagged with data science or machine learning that import Python packages were considered. NumPy, which is used for mathematical operations, is used in 74% of the projects. This is not surprising as it is a supporting package for scikit-learn among others. SciPy, pandas, and matplotlib are used in over 40% of the projects. scikit-learn is a collection of many algorithms and is used in 38% of the packages. TensorFlow is used in 24% of the projects, even though it is popular the use cases for it are narrow.

  1. numpy (74%)
  2. scipy (47%)
  3. pandas (41%)
  4. matplotlib (40%)
  5. scikit-learn (38%)
  6. Unlock access to the largest independent learning library in Tech for FREE!
    Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
    Renews at $19.99/month. Cancel anytime
  7. six (31%)
  8. tensorflow (24%)
  9. requests (23%)
  10. python-dateutil (22%)
  11. pytz (21%)

Machine learning projects with most contributions


Tensorflow had the most contributions followed by scikit-learn. Julia again seems to have been garnering interest ranking fourth in this list.

  1. tensorflow/tensorflow
  2. scikit-learn/scikit-learn
  3. explosion/spaCy
  4. JuliaLang/julia
  5. CMU-Perceptual-Computing-Lab/openpose
  6. tensorflow/serving
  7. thtrieu/darkflow
  8. ageitgey/face-recognition
  9. RasaHQ/rasa_nlu
  10. tesseract-ocr/tesseract



GitHub Octoverse: The top programming languages of 2018

What we learnt from the GitHub Octoverse 2018 Report

Julia for machine learning. Will the new language pick up pace?