Appreciating data and visualization
In the not-too-distant past, most of us consumed data pretty much solely via a daily newspaper—the financial pages, the sports section, and the weather forecast. However, in recent years, the ubiquity of computing power has immersed every part of our lives in a sea of data.
Around the clock, our built environment and devices collect innumerable amounts of data, which we consume. Our morning routine starts with a review of emails, social media posts, and news feeds on a smartphone or tablet, and whereas we once put down the daily newspaper when we left for work, our phones come with us everywhere.
We walk around or exercise and our phones capture our activity and location data via the global positioning system (GPS), while our smartwatches capture our vital signs. When we browse the web, every single interaction down to a mouse click is logged and stored for analysis. The servers that deliver these experiences are monitored and maintained by engineers on a round-the-clock basis. Marketers and salesforces continually analyze this data in order to make business-critical decisions.
On the way to work, our cars, buses, and trains contain increasingly sophisticated computers that silently log tens of thousands of real-time metrics, using them to calculate efficiency, profitability, engine performance, and environmental impact. Technicians evaluating these physical systems’ health or troubleshooting problems often sift through an enormous stream of data to tease out the signs of a faulty sensor or a failed part. The importance of this data is globally recognized. This is precisely why data recorders are the most valuable forensic artifact after any transportation accident, and why their recovery generates such widespread media coverage.
Meanwhile, in the modern home, a smart thermostat dutifully logs the settings on a Heating, Ventilation, and Air Conditioning (HVAC) system, as well as the current temperature both inside and outside the house. These devices continually gather real-time weather information in order to make decisions about how and when to run most efficiently.
Similar to the systems at home, but on a much larger scale, nearly every building we pass through during the day collects and monitors the health of a number of key infrastructure systems, from air conditioning to plumbing to security. No amount of paper could possibly record the thousands of channels of data flowing through these physical plants, and yet the building management system aggregates this data to make the same kinds of simple decisions as the homeowner does.
Moreover, these examples represent only a drop in the ocean of data. Around the world, governments, scientists, NGOs, and everyday citizens collect, store, and analyze their own datasets. They are all confronted with the same issue: how to aggregate, collate, or distill the mass of data into a form that a human can perceive and act on in a few seconds or less. The response to this issue is effective data storage and visualization.
Storing, retrieving, and visualizing data
For years, the basic language of data visualization was well-defined: using a chart, graph, histogram, and so on. What was missing was the ability to rapidly create these charts and graphs not in hours or days but in seconds or even milliseconds. This requires processing power that draws representations of thousands and thousands of data points in the time it takes to refresh a computer display.
For decades, only the most powerful computers could manage the processing power required to visualize data on this scale, and the software they ran was specialized and expensive. However, a number of trends in computing have converged to produce a renaissance in data acquisition and visualization, making it accessible not only to domain practitioners but also to technically proficient members of the general public. They are as follows:
- Cheap general-purpose CPUs and graphics GPUs
- Inexpensive high-capacity storage, optimized for physical size and maximum throughput
- Web standards and technologies, including JavaScript and CSS
- Open source software frameworks and toolkits
- Scalable cloud computation at affordable prices
- Broadband networking to enterprises, homes, and mobile devices
A common feature of virtually all of this data, that is, for each sample from a sensor or line in a log file, is the snapshot from an invisible ticking clock: a timestamp. A dataset gathered from these data points across a period of time is referred to as a time series. A stored object containing one or more time series is a time-series dataset. An application that can provide optimized access to one or more of these datasets is called, naturally, a time-series database (TSDB). While a whole class of NoSQL time-series databases, such as InfluxDB, OpenTSDB, and Prometheus, have sprung up, venerable SQL relational databases, such as PostgreSQL and MySQL, have added their own support for time-series datasets.
That’s fine for storing and retrieving data, but what about visualizing data? Enter Grafana.