You're reading from Interactive Data Visualization with Python Present your data as an effective and compelling story

Product type Paperback

Published in Apr 2020

Publisher

ISBN-13 9781800200944

Length 362 pages

Edition 2nd Edition

Languages

Python

Tools

Matplotlib

Concepts

Data Visualization

Authors (4):

Shubhangi Hora

Abha Belorkar

Anshu Kumar

Sharath Chandra Guntuku

View More author details

Table of Contents (9) Chapters

Preface

About the Book

1. Introduction to Visualization with Python – Basic and Customized Plotting

2. Static Visualization – Global Patterns and Summary Statistics FREE CHAPTER

3. From Static to Interactive Visualization

4. Interactive Visualization of Data across Strata

5. Interactive Visualization of Data across Time

6. Interactive Visualization of Geographical Data

7. Avoiding Common Pitfalls to Create Interactive Visualizations

Appendix

About the Book

With so much data being continuously generated, developers who present data as impactful and interesting visualizations, are always in demand. Interactive Data Visualization with Python, Second Edition, sharpens your data exploration skills and provides an excellent takeoff in your remarkable journey of creating interactive data visualizations with Python.

You'll begin by learning how to draw various plots with Matplotlib and Seaborn, the non-interactive data visualization libraries. You'll study different types of visualizations, compare them, and learn how to select a particular type of visualization to suit your requirements. After you get a hang of the various non-interactive visualization libraries, you'll learn the principles of intuitive and persuasive data visualization, and use Altair, Bokeh and Plotly to transform your visuals into strong stories.

By the end of the book, you'll have a new skill set that'll make you the go-to person for transforming data visualizations into engaging and interesting stories.

About the Authors

Abha Belorkar is an educator and researcher in computer science. She received her bachelor's degree in computer science from Birla Institute of Technology and Science Pilani, India and her Ph.D. from the National University of Singapore. Her current research work involves the development of methods powered by statistics, machine learning, and data visualization techniques to derive insights from heterogeneous genomics data on neurodegenerative diseases.

Sharath Chandra Guntuku is a researcher in natural language processing and multimedia computing. He received his bachelor's degree in computer science from Birla Institute of Technology and Science, Pilani, India and his Ph.D. from Nanyang Technological University, Singapore. His research aims to leverage large-scale social media image and text data to model social health outcomes and psychological traits. He uses machine learning, statistical analysis, natural language processing, and computer vision to answer questions pertaining to health and psychology in individuals and communities.

Shubhangi Hora is a Python developer, artificial intelligence enthusiast, data scientist, and writer. With a background in computer science and psychology, she is particularly passionate about mental health-related AI. Apart from this, she is interested in the performing arts and is a trained musician.

Anshu Kumar is a data scientist with over 5 years of experience in solving complex problems in natural language processing and recommendation systems. He has an M.Tech. from Indian Institute of Technology, Madras in computer science. He is also a mentor at SpringBoard. His current interests are building semantic search, text summarization, and content recommendations for large-scale multilingual datasets.

Learning Objectives

By the end of this book, you will be able to:

Explore and apply different static and interactive data visualization techniques
Make effective use of plot types and features from the Matplotlib, Seaborn, Altair, Bokeh, and Plotly libraries
Master the art of selecting appropriate plotting parameters and styles to create attractive plots
Choose meaningful and informative ways to present your stories through data
Customize data visualization for specific scenarios, contexts, and audiences
Avoid common errors and slip-ups in visualizing data

Audience

This book intends to provide a solid training ground for Python developers, data analysts, and data scientists to enable them to present critical data insights in a way that best captures the user's attention and imagination. It serves as a simple step-by-step guide that demonstrates the different types and components of visualization, the principles and techniques of effective interactivity, as well as common pitfalls to avoid when creating interactive data visualizations.

Students should have an intermediate level of competency in writing Python code, as well as some familiarity with using libraries such as pandas.

Approach

Resources for learning interactive data visualization are scarce. Moreover, the materials that are available either deal with tools other than Python (for example, Tableau), or focus on a single Python library for visualization. This book is the first of its kind to present a variety of options for building interactive data visualizations with Python. Moreover, the method of presentation is simple and accessible for anyone who is well versed in Python.

The book follows an engaging syllabus as the reader is systematically led through the various steps and aspects of interactive visualization with a series of realistic case studies. The book is packed with actionable information throughout, and programming activities are supplemented with helpful tips and advice on the capabilities and limitations of the tools being used.

Hardware Requirements

For an optimal experience, we recommend the following hardware configuration:

Intel® Core™ i5 processor 4300M at 2.60 GHz or 2.59 GHz (1 socket, 2 cores, 2 threads per core) and 8 GB of DRAM
Intel® Xeon® processor E5-2698 v3 at 2.30 GHz (2 sockets, 16 cores each, 1 thread per core) and 64 GB of DRAM
Intel® Xeon Phi™ processor 7210 at 1.30 GHz (1 socket, 64 cores, 4 threads per core), 32 GB of DRAM, and 16 GB of MCDRAM (flat mode enabled)
Disk space: 2 to 3 GB
Operating systems: Windows® 10, macOS, and Linux

Minimum System Requirements:

Processors: Intel Atom® processor or Intel® Core™ i3 processor
Disk space: 1 GB
Operating systems: Windows 7 or later, macOS, and Linux

Software Requirements

We also recommend that you have the following software installed in advance:

Browser: Google Chrome or Mozilla Firefox
The latest version of Git
Anaconda 3.7 Python distribution
Python 3.7
The following Python libraries installed: numpy, pandas, matplotlib, seaborn, plotly, bokeh, altair, and geopandas

Conventions

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows:

"Python performs advanced numerical and scientific computations with libraries such as numpy and scipy, hosts a wide array of machine learning methods owing to the availability of the scikit-learn package, provides a great interface for big data manipulation due to the availability of the pandas package and its compatibility with Apache Spark, and generates aesthetically pleasing plots and figures with libraries such as seaborn, plotly, and more."

A block of code is set as follows:

#import the python modules
import seaborn as sns
#load the dataset
diamonds_df = sns.load_dataset('diamonds')
#Plot a histogram
diamonds_df.hist(column='carat')

New terms and important words are shown in bold:

"The kernel density estimation is a non-parametric way to estimate the probability density function of a random variable."

Installation and Setup

Before we begin this journey of visualizing various types of data through different graphs and interactive features, we need to be prepared with the most productive environment. Follow these notes to learn how to do that:

Installing the Anaconda Python Distribution

Find the Anaconda version for your operating system on the official installation page at https://www.anaconda.com/distribution/.

After the download is complete, double-click on the file to open the installer and follow the prompts displayed on your screen.

Installing pip

To install pip, go to the following link and download the get-pip.py file: https://pip.pypa.io/en/stable/installing/.
Then, use the following command to install it: python get-pip.py.

You might need to use the python3 get-pip.py command, as previous versions of Python on your computer already use the Python command.

Installing the Python Libraries

Use the following command in your Anaconda terminal to install Seaborn:

pip install seaborn

Use the following command in your Anaconda terminal to install Bokeh:

pip install bokeh

Use the following command in your Anaconda terminal to install Plotly:

pip install plotly==4.1.0

Working with JupyterLab and Jupyter Notebook

You'll be working on different exercises and activities in Jupyter Lab or Notebook. These exercises and activities can be downloaded from the related GitHub repository.

You can download the repository here: https://github.com/TrainingByPackt/Interactive-Data-Visualization-with-Python.

You can either download it using GitHub or as a zipped folder by clicking on the green clone or download button in the top-right corner. In order to open Jupyter Notebooks, you have to traverse into the directory with your terminal. To do that, type the following:

cd Interactive-Data-Visualization-with-Python/<your current chapter>.

For example:

cd Interactive-Data-Visualization-with-Python/Chapter01/

To complete the process, perform the following steps:

To reach each activity and exercise, you have to use cd once more to go into each folder, like so:
```
cd Activity01
```
Once you are in the folder of your choice, simply call the following:
jupyter-lab to start up JupyterLab. Similarly, for Jupyter Notebook, call jupyter notebook

Importing the Python Libraries

Every exercise and activity in this book will make use of various libraries. Importing libraries into Python is very simple. Here's how we do it:

To import libraries, such as seaborn and pandas, we have to run the following code:
```
#import the python modules
import seaborn
import pandas 
```
This will import the whole numpy library into our current file.
In the first cells of the exercises and activities of this book, you will see the following code. We can use sns instead of seaborn in our code to call methods from seaborn:
```
# import seaborn and assign alias sns
import seaborn as sns 
```

Installing Git

To install Git, go to https://git-scm.com/downloads and follow the instructions that are specific to your platform.