Visualizing large data
The majority of this notebook has been dedicated to processing large datasets and plotting histograms. This was done intentionally because by using such an approach, the number of artists on the matplotlib canvas is limited to something in the order of hundreds, which is better than attempting to plot millions of artists. In this section, we will address the problem of displaying the actual elements of large datasets. We will then return to the last HDF5 table in the remainder of the chapter.
As a refresher on the volume that we're looking at, the number of data points in our dataset can be calculated in the following way:
In [45]: data_len = len(tab) data_len Out[45]: 288000000
Again, our dataset has nearly one third of a billion points. That is almost certainly more than matplotlib can handle. In fact, one often sees comments online that warn users not to attempt plotting more than ten thousand or one hundred thousand points.
However, is this a good advice...