Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering Numerical Computing with NumPy

You're reading from   Mastering Numerical Computing with NumPy Master scientific computing and perform complex operations with ease

Arrow left icon
Product type Paperback
Published in Jun 2018
Publisher Packt
ISBN-13 9781788993357
Length 248 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (3):
Arrow left icon
Tiago Antao Tiago Antao
Author Profile Icon Tiago Antao
Tiago Antao
Mert Cuhadaroglu Mert Cuhadaroglu
Author Profile Icon Mert Cuhadaroglu
Mert Cuhadaroglu
Umit Mert Cakmak Umit Mert Cakmak
Author Profile Icon Umit Mert Cakmak
Umit Mert Cakmak
Arrow right icon
View More author details
Toc

Table of Contents (11) Chapters Close

Preface 1. Working with NumPy Arrays FREE CHAPTER 2. Linear Algebra with NumPy 3. Exploratory Data Analysis of Boston Housing Data with NumPy Statistics 4. Predicting Housing Prices Using Linear Regression 5. Clustering Clients of a Wholesale Distributor Using NumPy 6. NumPy, SciPy, Pandas, and Scikit-Learn 7. Advanced Numpy 8. Overview of High-Performance Numerical Computing Libraries 9. Performance Benchmarks 10. Other Books You May Enjoy

NumPy array operations

This section will guide you through the creation and manipulation of numerical data with NumPy. Let's start by creating a NumPy array from the list:

In [17]: my_list = [2, 14, 6, 8]
my_array = np.asarray(my_list)
type(my_array)
Out[17]: numpy.ndarray

Let's do some addition, subtraction, multiplication, and division with scalar values:

In [18]: my_array + 2
Out[18]: array([ 4, 16, 8, 10])
In [19]: my_array - 1
Out[19]: array([ 1, 13, 5, 7])
In [20]: my_array * 2
Out[20]: array([ 4, 28, 12, 16, 8])
In [21]: my_array / 2
Out[21]: array([ 1. , 7. , 3. , 4. ])

It's much harder to do the same operations in a list because the list does not support vectorized operations and you need to iterate its elements. There are many ways to create NumPy arrays, and now you will use one of these methods to create an array which is full of zeros. Later, you will perform some arithmetic operations to see how NumPy behaves in element-wise operations between two arrays:

In [22]: second_array = np.zeros(4) + 3
second_array
Out[22]: array([ 3., 3., 3., 3.])
In [23]: my_array - second_array
Out[23]: array([ -1., 11., 3., 5.])
In [24]: second_array / my_array
Out[24]: array([ 1.5 , 0.21428571, 0.5 , 0.375 ])

As we did in the previous code, you can create an array which is full of ones with np.ones or an identity array with np.identity and do the same algebraic operations that you did previously:

In [25]: second_array = np.ones(4) + 3
second_array
Out[25]: array([ 4., 4., 4., 4.])
In [26]: my_array - second_array
Out[26]: array([ -2., 10., 2., 4.])
In [27]: second_array / my_array
Out[27]: array([ 2. , 0.28571429, 0.66666667, 0.5 ])

It works as expected with the np.ones method, but when you use the identity matrix, the calculation returns a (4,4) array as follows:

In [28]: second_array = np.identity(4)
second_array
Out[28]: array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.]])
In [29]: second_array = np.identity(4) + 3
second_array
Out[29]: array([[ 4., 3., 3., 3.],
[ 3., 4., 3., 3.],
[ 3., 3., 4., 3.],
[ 3., 3., 3., 4.]])
In [30]: my_array - second_array
Out[30]: array([[ -2., 11., 3., 5.],
[ -1., 10., 3., 5.],
[ -1., 11., 2., 5.],
[ -1., 11., 3., 4.]])

What this does is subtract the first element of my_array from all of the elements of the first column of second_array and the second_element of the second column, and so on. The same rule is applied to division as well. Please keep in mind that you can successfully do array operations even if they are not exactly the same shape. Later in this chapter, you will learn about broadcasting errors when computation cannot be done between two arrays due to differences in their shapes:

In [31]: second_array / my_array
Out[31]: array([[ 2. , 0.21428571, 0.5 , 0.375 ],
[ 1.5 , 0.28571429, 0.5 , 0.375 ],
[ 1.5 , 0.21428571, 0.66666667, 0.375 ],
[ 1.5 , 0.21428571, 0.5 , 0.5 ]])

One of the most useful methods in creating NumPy arrays is arange. This returns an array for a given interval between your start and end values. The first argument is the start value of your array, the second is the end value (where it stops creating values), and the third one is the interval. Optionally, you can define your dtype as the fourth argument. The default interval values are 1:

In [32]: x = np.arange(3,7,0.5)
x
Out[32]: array([ 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. , 6.5])

There is another way to create an array with fixed intervals between the start and stop point when you cannot decide what the interval should be, but you should know how many splits your array should have:

In [33]: x = np.linspace(1.2, 40.5, num=20)
x
Out[33]: array([ 1.2 , 3.26842105, 5.33684211, 7.40526316, 9.47368421,
11.54210526, 13.61052632, 15.67894737, 17.74736842, 19.81578947,
21.88421053, 23.95263158, 26.02105263, 28.08947368, 30.15789474,
32.22631579, 34.29473684, 36.36315789, 38.43157895, 40.5 ])

There are two different methods which are similar in usage but return different sequences of numbers because their base scale is different. This means that the distribution of the numbers will be different as well. The first one is geomspace, which returns numbers on a logarithmic scale with a geometric progression:

In [34]: np.geomspace(1, 625, num=5)
Out[34]: array([ 1., 5., 25., 125., 625.])

The other important method is logspace, where you can return the values for your start and stop points, which are evenly scaled in:

In [35]: np.logspace(3, 4, num=5)
Out[35]: array([ 1000. , 1778.27941004, 3162.27766017, 5623.4132519 ,
10000. ])

What are these arguments? If the starting point is 3 and the ending point is 4, then these functions return the numbers which are much higher than the initial range. Your starting point is actually set as default to 10**Start Argument and the ending is set as 10**End Argument. So technically, in this example, the starting point is 10**3 and the ending point is 10**4. You can avoid such situations and keep your start and end points the same as when you put them as arguments in the method. The trick is to use base 10 logarithms of the given arguments:

In [36]: np.logspace(np.log10(3) , np.log10(4) , num=5)
Out[36]: array([ 3. , 3.2237098 , 3.46410162, 3.72241944, 4. ])

By now, you should be familiar with different ways of creating arrays with different distributions. You have also learned how to do some basic operations with them. Let's continue with other useful functions that you will definitely use in your day to day work. Most of the time, you will have to work with multiple arrays and you will need to compare them very quickly. NumPy has a great solution for this problem; you can compare the arrays as you would compare two integers:

In [37]: x = np.array([1,2,3,4])
y = np.array([1,3,4,4])
x == y
Out[37]: array([ True, False, False, True], dtype=bool)

The comparison is done element-wise and it returns a Boolean vector, whether elements are matching in two different arrays or not. This method works well in small size arrays and also gives you more details. You can see from the output array where the values are represented as False, that these indexed values are not matching in these two arrays. If you have a large array, you may also choose to get a single answer to your question, whether the elements are matching in two different arrays or not:

In [38]: x = np.array([1,2,3,4])
y = np.array([1,3,4,4])
np.array_equal(x,y)
Out[38]: False

Here, you have a single Boolean output. You only know that arrays are not equal, but you don't know which elements exactly are not equal. The comparison is not only limited to checking whether two arrays are equal or not. You can also do element-wise higher- lower comparison between two arrays:

In [39]: x = np.array([1,2,3,4])
y = np.array([1,3,4,4])
x < y
Out[39]: array([False, True, True, False], dtype=bool)

When you need to do logical comparison (AND, OR, XOR), you can use them in your array as follows:

In [40]: x = np.array([0, 1, 0, 0], dtype=bool)
y = np.array([1, 1, 0, 1], dtype=bool)
np.logical_or(x,y)
Out[40]: array([ True, True, False, True], dtype=bool)
In [41]: np.logical_and(x,y)
Out[41]: array([False, True, False, False], dtype=bool)
In [42]: x = np.array([12,16,57,11])
np.logical_or(x < 13, x > 50)
Out[42]: array([ True, False, True, True], dtype=bool)

So far, algebraic operations such as addition and multiplication have been covered. How can we use these operations with transcendental functions such as the exponential function, logarithms, or trigonometric functions?

In [43]: x = np.array([1, 2, 3,4 ])
np.exp(x)
Out[43]: array([ 2.71828183, 7.3890561 , 20.08553692, 54.59815003])
In [44]: np.log(x)
Out[44]: array([ 0. , 0.69314718, 1.09861229, 1.38629436])
In [45]: np.sin(x)
Out[45]: array([ 0.84147098, 0.90929743, 0.14112001, -0.7568025 ])

What about the transpose of a matrix? First, you will use the reshape function with arange to set the desired shape of the matrix:

In [46]: x = np.arange(9)
x
Out[46]: array([0, 1, 2, 3, 4, 5, 6, 7, 8])
In [47]: x = np.arange(9).reshape((3, 3))
x
Out[47]: array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [48]: x.T
Out[48]: array([[0, 3, 6],
[1, 4, 7],
[2, 5, 8]])

You transpose the 3*3 array, so the shape doesn't change because both dimensions are 3. Let's see what happens when you don't have a square array:

In [49]: x = np.arange(6).reshape(2,3)
x
Out[49]: array([[0, 1, 2],
[3, 4, 5]])
In [50]: x.T
Out[50]: array([[0, 3],
[1, 4],
[2, 5]])

The transpose works as expected and the dimensions are switched as well. You can also get summary statistics from arrays such as mean, median, and standard deviation. Let's start with methods that NumPy offers for calculating basic statistics:

Method
Description
np.sum
Returns the sum of all array values or along the specified axis
np.amin
Returns the minimum value of all arrays or along the specified axis
np.amax
Returns the maximum value of all arrays or along the specified axis
np.percentile
Returns the given qth percentile of all arrays or along the specified axis
np.nanmin
The same as np.amin, but ignores NaN values in an array
np.nanmax
The same as np.amax, but ignores NaN values in an array
np.nanpercentile
The same as np.percentile, but ignores NaN values in an array

The following code block gives an example of the preceding statistical methods of NumPy. These methods are very useful as you can operate the methods in a whole array or axis-wise according to your needs. You should note that you can find more fully-featured and better implementations of these methods in SciPy as it uses NumPy multidimensional arrays as a data structure:

In [51]: x = np.arange(9).reshape((3,3))
x
Out[51]: array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [52]: np.sum(x)
Out[52]: 36
In [53]: np.amin(x)
Out[53]: 0
In [54]: np.amax(x)
Out[54]: 8
In [55]: np.amin(x, axis=0)
Out[55]: array([0, 1, 2])
In [56]: np.amin(x, axis=1)
Out[56]: array([0, 3, 6])
In [57]: np.percentile(x, 80)
Out[57]: 6.4000000000000004

The axis argument determines the dimension that this function will operate on. In this example, axis=0 represents the first axis which is the x axis, and axis = 1 represents the second axis which is y. When we use a regular amin(x), we return a single value because it calculates the minimum value in all arrays, but when we specify the axis, it starts evaluating the function axis-wise and returns an array which shows the results for each row or column. Imagine you have a large array; you find the max value by using amax, but what will happen if you need to pass the index of this value to another function? In such cases, argmin and argmax come to the rescue, as shown in the following snippet:

In [58]: x = np.array([1,-21,3,-3])
np.argmax(x)
Out[58]: 2
In [59]: np.argmin(x)
Out[59]: 1

Let's continue with more statistical functions:

Method

Description

np.mean

Returns the mean of all array values or along the specific axis

np.median

Returns the median of all array values or along the specific axis

np.std

Returns the standard deviation of all array values or along the specific axis

np.nanmean

The same as np.mean, but ignores NaN values in an array

np.nanmedian

The same as np.nanmedian, but ignores NaN values in an array

np.nonstd

The same as np.nanstd, but ignores NaN values in an array

The following code gives more examples of the preceding statistical methods of NumPy. These methods are heavily used in data discovery phases, where you analyze your data features and distribution:

In [60]: x = np.array([[2, 3, 5], [20, 12, 4]])
x
Out[60]: array([[ 2, 3, 5],
[20, 12, 4]])
In [61]: np.mean(x)
Out[61]: 7.666666666666667
In [62]: np.mean(x, axis=0)
Out[62]: array([ 11. , 7.5, 4.5])
In [63]: np.mean(x, axis=1)
Out[63]: array([ 3.33333333, 12. ])
In [64]: np.median(x)
Out[64]: 4.5
In [65]: np.std(x)
Out[65]: 6.3944420310836261
You have been reading a chapter from
Mastering Numerical Computing with NumPy
Published in: Jun 2018
Publisher: Packt
ISBN-13: 9781788993357
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime