Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering Numerical Computing with NumPy

You're reading from   Mastering Numerical Computing with NumPy Master scientific computing and perform complex operations with ease

Arrow left icon
Product type Paperback
Published in Jun 2018
Publisher Packt
ISBN-13 9781788993357
Length 248 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (3):
Arrow left icon
Tiago Antao Tiago Antao
Author Profile Icon Tiago Antao
Tiago Antao
Mert Cuhadaroglu Mert Cuhadaroglu
Author Profile Icon Mert Cuhadaroglu
Mert Cuhadaroglu
Umit Mert Cakmak Umit Mert Cakmak
Author Profile Icon Umit Mert Cakmak
Umit Mert Cakmak
Arrow right icon
View More author details
Toc

Table of Contents (11) Chapters Close

Preface 1. Working with NumPy Arrays 2. Linear Algebra with NumPy FREE CHAPTER 3. Exploratory Data Analysis of Boston Housing Data with NumPy Statistics 4. Predicting Housing Prices Using Linear Regression 5. Clustering Clients of a Wholesale Distributor Using NumPy 6. NumPy, SciPy, Pandas, and Scikit-Learn 7. Advanced Numpy 8. Overview of High-Performance Numerical Computing Libraries 9. Performance Benchmarks 10. Other Books You May Enjoy

Basics of NumPy array objects

As mentioned in the preceding section, what makes NumPy special is the usage of multidimensional arrays called ndarrays. All ndarray items are homogeneous and use the same size in memory. Let's start by importing NumPy and analyzing the structure of a NumPy array object by creating the array. You can easily import this library by typing the following statement into your console. You can use any naming convention instead of np, but in this book, np will be used as it's the standard convention. Let's create a simple array and explain what the attributes hold by Python behind the scenes as metadata of the created array, so-called attributes:

In [2]: import numpy as np
x = np.array([[1,2,3],[4,5,6]])
x
Out[2]: array([[1, 2, 3],[4, 5, 6]])
In [3]: print("We just create a ", type(x))
Out[3]: We just create a <class 'numpy.ndarray'>
In [4]: print("Our template has shape as" ,x.shape)
Out[4]: Our template has shape as (2, 3)
In [5]: print("Total size is",x.size)
Out[5]: Total size is 6
In [6]: print("The dimension of our array is " ,x.ndim)
Out[6]: The dimension of our array is 2
In [7]: print("Data type of elements are",x.dtype)
Out[7]: Data type of elements are int32
In [8]: print("It consumes",x.nbytes,"bytes")
Out[8]: It consumes 24 bytes

As you can see, the type of our object is a NumPy array. x.shape returns a tuple which gives you the dimension of the array as an output such as (n,m). You can get the total number of elements in an array by using x.size. In our example, we have six elements in total. Knowing attributes such as shape and dimension is very important. The more you know, the more you will be comfortable with computations. If you don't know your array's size and dimensions, it wouldn't be wise to start doing computations with it. In NumPy, you can use x.ndim to check what the dimension of your array is. There are other attributes such as dtype and nbytes, which are very useful while you are checking memory consumption and verifying what kind of data type you should use in the array. In our example, each element has a data type of int32 that consumes 24 bytes in total. You can also force some of these attributes while creating your array such as dtype. Previously, the data type was an integer. Let's switch it to float, complex, or uint (unsigned integer). In order to see what the data type change does, let's analyze what byte consumption is, which is shown as follows:

In [9]: x = np.array([[1,2,3],[4,5,6]], dtype = np.float)
print(x)
Out[9]: print(x.nbytes)
[[ 1. 2. 3.]
[ 4. 5. 6.]]
48
In [10]: x = np.array([[1,2,3],[4,5,6]], dtype = np.complex)
print(x)
print(x.nbytes)
Out[10]: [[ 1.+0.j 2.+0.j 3.+0.j]
[ 4.+0.j 5.+0.j 6.+0.j]]
96
In [11]: x = np.array([[1,2,3],[4,-5,6]], dtype = np.uint32)
print(x)
print(x.nbytes)
Out[11]: [[ 1 2 3]
[ 4 4294967291 6]]
24

As you can see, each type consumes a different number of bytes. Imagine you have a matrix as follows and that you are using int64 or int32 as the data type:

In [12]: x = np.array([[1,2,3],[4,5,6]], dtype = np.int64)
print("int64 consumes",x.nbytes, "bytes")
x = np.array([[1,2,3],[4,5,6]], dtype = np.int32)
print("int32 consumes",x.nbytes, "bytes")
Out[12]: int64 consumes 48 bytes
int32 consumes 24 bytes

The memory need is doubled if you use int64. Ask this question to yourself; which data type would suffice? Until your numbers are higher than 2,147,483,648 and lower than -2,147,483,647, using int32 is enough. Imagine you have a huge array with a size over 100 MB. In such cases, this conversion plays a crucial role in performance.

As you may have noticed in the previous example, when you were changing the data types, you were creating an array each time. Technically, you cannot change the dtype after you create the array. However, what you can do is either create it again or copy the existing one with a new dtype and with the astype attribute. Let's create a copy of the array with the new dtype. Here is an example of how you can also change your dtype with the astype attribute:

In [13]: x_copy = np.array(x, dtype = np.float)
x_copy
Out[13]: array([[ 1., 2., 3.],
[ 4., 5., 6.]])
In [14]: x_copy_int = x_copy.astype(np.int)
x_copy_int
Out[14]: array([[1, 2, 3],
[4, 5, 6]])

Please keep in mind that when you use the astype attribute, it doesn't change the dtype of the x_copy, even though you applied it to x_copy. It keeps the x_copy, but creates a x_copy_int:

In [15]: x_copy
Out[15]: array([[ 1., 2., 3.],
[ 4., 5., 6.]])

Let's imagine a case where you are working in a research group that tries to identify and calculate the risks of an individual patient who has cancer. You have 100,000 records (rows), where each row represents a single patient, and each patient has 100 features (results of some of the tests). As a result, you have (100000, 100) arrays:

In [16]: Data_Cancer= np.random.rand(100000,100)
print(type(Data_Cancer))
print(Data_Cancer.dtype)
print(Data_Cancer.nbytes)
Data_Cancer_New = np.array(Data_Cancer, dtype = np.float32)
print(Data_Cancer_New.nbytes)
Out[16]: <class 'numpy.ndarray'>
float64
80000000
40000000

As you can see from the preceding code, their size decreases from 80 MB to 40 MB just by changing the dtype. What we get in return is less precision after decimal points. Instead of being precise to 16 decimals points, you will have only 7 decimals. In some machine learning algorithms, precision can be negligible. In such cases, you should feel free to adjust your dtype so that it minimizes your memory usage.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image