About the Book
With so much data being continuously generated, developers who present data as impactful and interesting visualizations, are always in demand. Interactive Data Visualization with Python, Second Edition, sharpens your data exploration skills and provides an excellent takeoff in your remarkable journey of creating interactive data visualizations with Python.
You'll begin by learning how to draw various plots with Matplotlib
and Seaborn
, the non-interactive data visualization libraries. You'll study different types of visualizations, compare them, and learn how to select a particular type of visualization to suit your requirements. After you get a hang of the various non-interactive visualization libraries, you'll learn the principles of intuitive and persuasive data visualization, and use Altair
, Bokeh
and Plotly
to transform your visuals into strong stories.
By the end of the book, you'll have a new skill set that'll make you the go-to person for transforming data visualizations into engaging and interesting stories.
About the Authors
Abha Belorkar is an educator and researcher in computer science. She received her bachelor's degree in computer science from Birla Institute of Technology and Science Pilani, India and her Ph.D. from the National University of Singapore. Her current research work involves the development of methods powered by statistics, machine learning, and data visualization techniques to derive insights from heterogeneous genomics data on neurodegenerative diseases.
Sharath Chandra Guntuku is a researcher in natural language processing and multimedia computing. He received his bachelor's degree in computer science from Birla Institute of Technology and Science, Pilani, India and his Ph.D. from Nanyang Technological University, Singapore. His research aims to leverage large-scale social media image and text data to model social health outcomes and psychological traits. He uses machine learning, statistical analysis, natural language processing, and computer vision to answer questions pertaining to health and psychology in individuals and communities.
Shubhangi Hora is a Python developer, artificial intelligence enthusiast, data scientist, and writer. With a background in computer science and psychology, she is particularly passionate about mental health-related AI. Apart from this, she is interested in the performing arts and is a trained musician.
Anshu Kumar is a data scientist with over 5 years of experience in solving complex problems in natural language processing and recommendation systems. He has an M.Tech. from Indian Institute of Technology, Madras in computer science. He is also a mentor at SpringBoard. His current interests are building semantic search, text summarization, and content recommendations for large-scale multilingual datasets.
Learning Objectives
By the end of this book, you will be able to:
- Explore and apply different static and interactive data visualization techniques
- Make effective use of plot types and features from the Matplotlib, Seaborn, Altair, Bokeh, and Plotly libraries
- Master the art of selecting appropriate plotting parameters and styles to create attractive plots
- Choose meaningful and informative ways to present your stories through data
- Customize data visualization for specific scenarios, contexts, and audiences
- Avoid common errors and slip-ups in visualizing data
Audience
This book intends to provide a solid training ground for Python developers, data analysts, and data scientists to enable them to present critical data insights in a way that best captures the user's attention and imagination. It serves as a simple step-by-step guide that demonstrates the different types and components of visualization, the principles and techniques of effective interactivity, as well as common pitfalls to avoid when creating interactive data visualizations.
Students should have an intermediate level of competency in writing Python code, as well as some familiarity with using libraries such as pandas.
Approach
Resources for learning interactive data visualization are scarce. Moreover, the materials that are available either deal with tools other than Python (for example, Tableau), or focus on a single Python library for visualization. This book is the first of its kind to present a variety of options for building interactive data visualizations with Python. Moreover, the method of presentation is simple and accessible for anyone who is well versed in Python.
The book follows an engaging syllabus as the reader is systematically led through the various steps and aspects of interactive visualization with a series of realistic case studies. The book is packed with actionable information throughout, and programming activities are supplemented with helpful tips and advice on the capabilities and limitations of the tools being used.
Hardware Requirements
For an optimal experience, we recommend the following hardware configuration:
- Intel® Core™ i5 processor 4300M at 2.60 GHz or 2.59 GHz (1 socket, 2 cores, 2 threads per core) and 8 GB of DRAM
- Intel® Xeon® processor E5-2698 v3 at 2.30 GHz (2 sockets, 16 cores each, 1 thread per core) and 64 GB of DRAM
- Intel® Xeon Phi™ processor 7210 at 1.30 GHz (1 socket, 64 cores, 4 threads per core), 32 GB of DRAM, and 16 GB of MCDRAM (flat mode enabled)
- Disk space: 2 to 3 GB
- Operating systems: Windows® 10, macOS, and Linux
Minimum System Requirements:
- Processors: Intel Atom® processor or Intel® Core™ i3 processor
- Disk space: 1 GB
- Operating systems: Windows 7 or later, macOS, and Linux
Software Requirements
We also recommend that you have the following software installed in advance:
- Browser: Google Chrome or Mozilla Firefox
- The latest version of Git
- Anaconda 3.7 Python distribution
- Python 3.7
- The following Python libraries installed:
numpy
,pandas
,matplotlib
,seaborn
,plotly
,bokeh
,altair
, andgeopandas
Conventions
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows:
"Python performs advanced numerical and scientific computations with libraries such as numpy
and scipy
, hosts a wide array of machine learning methods owing to the availability of the scikit-learn
package, provides a great interface for big data manipulation due to the availability of the pandas
package and its compatibility with Apache Spark, and generates aesthetically pleasing plots and figures with libraries such as seaborn
, plotly
, and more."
A block of code is set as follows:
#import the python modules import seaborn as sns #load the dataset diamonds_df = sns.load_dataset('diamonds') #Plot a histogram diamonds_df.hist(column='carat')
New terms and important words are shown in bold:
"The kernel density estimation is a non-parametric way to estimate the probability density function of a random variable."
Installation and Setup
Before we begin this journey of visualizing various types of data through different graphs and interactive features, we need to be prepared with the most productive environment. Follow these notes to learn how to do that:
Installing the Anaconda Python Distribution
Find the Anaconda version for your operating system on the official installation page at https://www.anaconda.com/distribution/.
After the download is complete, double-click on the file to open the installer and follow the prompts displayed on your screen.
Installing pip
- To install pip, go to the following link and download the
get-pip.py
file: https://pip.pypa.io/en/stable/installing/. - Then, use the following command to install it:
python get-pip.py
.
You might need to use the python3 get-pip.py
command, as previous versions of Python on your computer already use the Python command.
Installing the Python Libraries
Use the following command in your Anaconda terminal to install Seaborn
:
pip install seaborn
Use the following command in your Anaconda terminal to install Bokeh
:
pip install bokeh
Use the following command in your Anaconda terminal to install Plotly
:
pip install plotly==4.1.0
Working with JupyterLab and Jupyter Notebook
You'll be working on different exercises and activities in Jupyter Lab or Notebook. These exercises and activities can be downloaded from the related GitHub repository.
You can download the repository here: https://github.com/TrainingByPackt/Interactive-Data-Visualization-with-Python.
You can either download it using GitHub or as a zipped folder by clicking on the green clone or download button in the top-right corner. In order to open Jupyter Notebooks, you have to traverse into the directory with your terminal. To do that, type the following:
cd Interactive-Data-Visualization-with-Python/<your current chapter>.
For example:
cd Interactive-Data-Visualization-with-Python/Chapter01/
To complete the process, perform the following steps:
- To reach each activity and exercise, you have to use
cd
once more to go into each folder, like so:cd Activity01
- Once you are in the folder of your choice, simply call the following:
jupyter-lab
to start up JupyterLab. Similarly, for Jupyter Notebook, calljupyter notebook
Importing the Python Libraries
Every exercise and activity in this book will make use of various libraries. Importing libraries into Python is very simple. Here's how we do it:
- To import libraries, such as
seaborn
andpandas
, we have to run the following code:#import the python modules import seaborn import pandas
This will import the whole
numpy
library into our current file. - In the first cells of the exercises and activities of this book, you will see the following code. We can use
sns
instead ofseaborn
in our code to call methods fromseaborn
:# import seaborn and assign alias sns import seaborn as sns
Installing Git
To install Git, go to https://git-scm.com/downloads and follow the instructions that are specific to your platform.
Additional Resources
The code bundle for this book is also hosted on GitHub at https://github.com/TrainingByPackt/Interactive-Data-Visualization-with-Python.
The high-quality color images used in book can be found at: https://github.com/TrainingByPackt/Interactive-Data-Visualization-with-Python/tree/master/Graphics.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!