Using NumPy for statistics and algebra
NumPy is a Python library used for working with arrays. Additionally, it provides functions for working with matrices, the Fourier transform, and the area of linear algebra. Large, multi-dimensional arrays and matrices are now supported by NumPy, along with a wide range of sophisticated mathematical operations that may be performed on these arrays. They use a huge number of sophisticated mathematical functions to process massive multidimensional arrays and matrices, as well as basic scientific computations in machine learning, which makes them highly helpful. It gives the n-dimensional array, a straightforward yet effective data structure. Learning NumPy is the first step on every Python data scientist’s path because it serves as the cornerstone on which nearly all of the toolkit’s capabilities are constructed.
The array, which is a grid of values all of the same type that’s indexed by a tuple of nonnegative integers, is the fundamental building block utilized by NumPy. Similar to how the dimensions of a matrix are defined in algebra, the array’s rank is determined by its number of dimensions. A tuple of numbers indicating the size of the array along each dimension makes up the shape of an array:
import numpy as np arr = np.array([1, 2, 3, 4, 5]) print(arr) print(type(arr))
A NumPy array is a container that can house a certain number of elements, all of which must be of the same type, as was previously specified. The majority of data structures employ arrays to carry out their algorithms. Similar to how you can slice a list, you can also slice a NumPy array, but in more than one dimension. Similar to indexing, slicing a NumPy array returns an array that is a view of the original array.
Slicing in Python means taking elements from one given index to another given index. We can select certain elements of an array by slicing the array using [start:end]
, where we reference the elements of the array from where we can start and where we want to finish. We can also define the step using [start:end:step]
:
print('select elements by index:',arr[0]) print('slice elements of the array:',arr[1:5]) print('ending point of the array:',arr[4:]) print('ending point of the array:',arr[:4])
There are three different sorts of indexing techniques: field access, fundamental slicing, and advanced indexing. Basic slicing is the n-dimensional extension of Python’s fundamental slicing notion. By passing start
, stop
, and step
parameters to the built-in slice
function, a Python slice object is created. Writing understandable, clear, and succinct code is made possible through slicing. An iterable element is referred to by its position within the iterable when it is “indexed.” Getting a subset of elements from an iterable, depending on their indices, is referred to as “slicing.”
To combine (concatenate) two arrays, we must copy each element in both arrays to result
by using the np.concatenate()
function:
arr1 = np.array([1, 2, 3]) arr2 = np.array([4, 5, 6]) arr = np.concatenate((arr1, arr2)) print(arr)
Arrays can be joined using NumPy stack methods as well. We can combine two 1D arrays along the second axis to stack them on top of one another, a process known as stacking. The stack()
method receives a list of arrays that we wish to connect with the axis:
arr = np.stack((arr1, arr2), axis=1) print(arr)
The axis
parameter can be used to reference the axis over which we want to make the concatenation:
arr = np.stack((arr1, arr2), axis=0) print(arr)
The NumPy mean()
function is used to compute the arithmetic mean along the specified axis:
np.mean(arr,axis=1)
You need to use the NumPy mean()
function with axis=0
to compute the average by column. To compute the average by row, you need to use axis=1
:
np.mean(arr,axis=0)
In the next section, we will introduce pandas, a library for data analysis and manipulation. pandas is one of the most extensively used Python libraries in data science, much like NumPy. It offers high-performance, simple-to-use data analysis tools. In contrast to the multi-dimensional array objects provided by the NumPy library, pandas offers an in-memory 2D table object called a DataFrame.