Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Learning Julia

You're reading from   Learning Julia Build high-performance applications for scientific computing

Arrow left icon
Product type Paperback
Published in Nov 2017
Publisher Packt
ISBN-13 9781785883279
Length 316 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (2):
Arrow left icon
Rahul Lakhanpal Rahul Lakhanpal
Author Profile Icon Rahul Lakhanpal
Rahul Lakhanpal
Anshul Joshi Anshul Joshi
Author Profile Icon Anshul Joshi
Anshul Joshi
Arrow right icon
View More author details
Toc

Table of Contents (11) Chapters Close

Preface 1. Understanding Julia's Ecosystem FREE CHAPTER 2. Programming Concepts with Julia 3. Functions in Julia 4. Understanding Types and Dispatch 5. Working with Control Flow 6. Interoperability and Metaprogramming 7. Numerical and Scientific Computation with Julia 8. Data Visualization and Graphics 9. Connecting with Databases 10. Julia’s Internals

Julia's importance in data science

In the last decade, data science has become a buzzword, with Harvard Business Review naming it the sexiest job of the 21st century. What is a data scientist? The answer was published in The Guardian (https://www.theguardian.com/careers/2015/jun/30/whats-a-data-scientist-and-how-do-i-become-one):

A data scientist takes raw data and marries it with analysis to make it accessible and more valuable for an organization. To do this, they need a unique blend of skills—a solid grounding in maths and algorithms and a good understanding of human behaviors, as well as knowledge of the industry they're working in, to put their findings into context. From here, they can unlock insights from the datasets and start to identify trends. 

The technical skills of a data scientist are varied but, generally, they are good at programming, and have a very strong background in mathematics—especially statistics, skills in machine learning, and knowledge of big data. A data scientist is required to have in-depth understanding of the domain he/she is working in. Julia was designed for scientific and numerical computation. And with the advent of big data, there is a requirement to have a language that can work on huge amounts of data. Although we have Spark and MapReduce (Hadoop) as processing engines that are generally used with Python, Scala, and Java, Julia with Intel's High Performance Analytics Toolkit can provide an alternative option. It may also be worth noting that Julia excels at parallel computing but is much easier to write and prototype than Spark/Hadoop.

One great feature of Julia is that it solves the 2-language problem. Generally, with Python and R, code that is doing most of the heavy workload is written in C/C++ and it is then called. This is not required with Julia, as it can perform comparably to C/C++. Therefore, complete code—including code that does heavy computations—can be written in Julia itself.

Benchmarks

We mentioned the speed of Julia above, and that's what sets this language apart from traditional dynamically typed languages. Speed is its specialty. So, how fast can Julia be? The following micro-benchmark results were obtained on a single core (serial execution) on an Intel(R) Xeon(R) CPU E7-8850 2.00 GHz CPU with 1 TB of 1067 MHz DDR3 RAM running Linux:

Julia 0.4.0 Python 3.4.3 R 3.2.2 MATLAB R2015b Go go1.5 Java 1.8.0_45
fib 2.11 77.76 533.52 26.89 1.86 1.21
parse_int 1.45 17.02 45.73 802.52 1.20 3.35
quicksort 1.15 32.89 264.54 4.92 1.29 2.60
mandel 0.79 15.32 53.16 7.58 1.11 1.35
pi_sum 1.00 21.99 9.56 1.00 1.00 1.00
rand_mat_stat 1.66 17.93 14.56 14.52 2.96 3.92
rand_mat_mul 1.02 1.14 1.57 1.12 1.42 2.36

These benchmark times are relative to C (smaller is better, C performance = 1.0). Benchmarks can be misleading and are not always true. Good coding practices need to be followed and exactly identical conditions are required to measure them side by side. Julia has been quite open about how it measured these benchmarks and the code is available at https://github.com/JuliaLang/julia/tree/master/test/perf/micro.

  • Julia is significantly faster than Python. There is a huge difference in performance in the benchmarks. However, some libraries for numerical computation available to Python are written in C, and here it performs nearly equivalent to Julia.
  • R was specifically designed for statisticians. It has a huge set of libraries for statistics and numerical computation and is available for free. It used to be the language of choice for data scientists (now Python is preferred). R is single-threaded and is a lot slower than Julia.
  • MATLAB is not a free product. It comes with a paid license (students may get discounts). It is used by statisticians and academicians for some specific use cases. The above benchmarks run a lot slower on MATLAB.
  • Go is designed from scratch for system programming. It was created by Google and the source code is available on GitHub, where it is actively developed. Go performs really well on these benchmarks, but it is not designed for numerical and scientific computing.
  • Java performs well. It beats Julia in some benchmarks and Julia beats it in others. But we need to consider the development time associated with it. Julia is designed in a way that it can be used even for rapid prototyping. That makes it unique.

Julia is therefore well-suited to data science problems. Its ecosystem may not be as comprehensive as other languages right now, but it is growing at a great pace.

You have been reading a chapter from
Learning Julia
Published in: Nov 2017
Publisher: Packt
ISBN-13: 9781785883279
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime