Working with large data sources
Most of the data that users feed into matplotlib when generating plots is from NumPy. NumPy is one of the fastest ways of processing numerical and array-based data in Python (if not the fastest), so this makes sense. However by default, NumPy works on in-memory database. If the dataset that you want to plot is larger than the total RAM available on your system, performance is going to plummet.
In the following section, we're going to take a look at an example that illustrates this limitation. But first, let's get our notebook set up, as follows:
In [1]: import matplotlib matplotlib.use('nbagg') %matplotlib inline
Here are the modules that we are going to use:
In [2]: import glob, io, math, os import psutil import numpy as np import pandas as pd import tables as tb from scipy import interpolate from scipy.stats import burr, norm import matplotlib as mpl import matplotlib.pyplot as...