What this book covers
Chapter 1, Introducing Data Analysis and Libraries, describes the typical steps involved in a data analysis task. In addition, a couple of existing data analysis software packages are described.
Chapter 2, NumPy Arrays and Vectorized Computation, dives right into the core of the PyData ecosystem by introducing the NumPy package for high-performance computing. The basic data structure is a typed multidimensional array which supports various functions, among them typical linear algebra tasks. The data structure and functions are explained along with examples.
Chapter 3, Data Analysis with Pandas, introduces a prominent and popular data analysis library for Python called Pandas. It is built on NumPy, but makes a lot of real-world tasks simpler. Pandas comes with its own core data structures, which are explained in detail.
Chapter 4, Data Visualizaiton, focuses on another important aspect of data analysis: the understanding of data through graphical representations. The Matplotlib library is introduced in this chapter. It is one of the most popular 2D plotting libraries for Python and it is well integrated with Pandas as well.
Chapter 5, Time Series, shows how to work with time-oriented data in Pandas. Date and time handling can quickly become a difficult, error-prone task when implemented from scratch. We show how Pandas can be of great help there, by looking in detail at some of the functions for date parsing and date sequence generation.
Chapter 6, Interacting with Databases, deals with some typical scenarios. Your data does not live in vacuum, and it might not always be available as CSV files either. MongoDB is a NoSQL database and Redis is a data structure server, although many people think of it as a key value store first. Both storage systems are introduced to help you interact with data from real-world systems.
Chapter 7, Data Analysis Application Examples, applies many of the things covered in the previous chapters to deepen your understanding of typical data analysis workflows. How do you clean, inspect, reshape, merge, or group data – these are the concerns in this chapter. The library of choice in the chapter will be Pandas again.
Chapter 8, Machine Learning Models with scikit-learn, would like to make you familiar with a popular machine learning package for Python. While it supports dozens of models, we only look at four models, two supervised and two unsupervised. Even if this is not mentioned explicitly, this chapter brings together a lot of the existing tools. Pandas is often used for machine learning data preparation and matplotlib is used to create plots to facilitate understanding.