Manipulating arrays with NumPy and pandas
As we said in the introduction, numerous Python libraries have been developed to help with common data science tasks. The most fundamental ones are probably NumPy and pandas. Their goal is to provide a set of tools to manipulate a big set of data in an efficient way, much more than what we could actually achieve with standard Python, and we’ll show how and why in this section. NumPy and pandas are at the heart of most data science applications in Python; knowing about them is therefore the first step on your journey into Python for data science.
Before starting to use them, let’s explain why such libraries are needed. In Chapter 2, Python Programming Specificities, we stated that Python is a dynamically typed language. This means that the interpreter automatically detects the type of a variable at runtime, and this type can even change throughout the program. For example, you can do something like this in Python:
$ python...