Understanding Data-Driven Documents
Data-Driven Documents (D3.js) is a data-driven library for DOM manipulation and a graphical toolkit with maximum compatibility, accessibility, and performance. It utilizes fully the capabilities of modern browsers and web standards (such as HTML, CSS, and SVG).
It is open source and hosted on GitHub (https://github.com/mbostock/d3) under a slightly modified BSD 3-clause license; therefore, it can be used in commercial products without being required to release any modifications. By the way, GitHub itself uses D3.js to visualize the contribution history of repositories.
Note
Make yourself familiar with the wiki pages and the API reference on GitHub as they will become your companions during the next weeks:
- Wiki pages (https://github.com/mbostock/d3/wiki)
- API reference (https://github.com/mbostock/d3/wiki/API-Reference)
Why do we use D3.js?
D3.js is used for various different tasks, but it's mainly used for the following purposes:
- Transforming HTML or SVG elements in the DOM tree, as shown in the following code:
<script type="text/javascript"> // Example for HTML // Change the background color of all p elements d3.selectAll('p').style('background-color', 'red'); </script>
- Transforming data into HTML or SVG elements as follows:
<script type="text/javascript"> // Example for SVG d3.selectAll('circle').data(dataArray) .enter() .append('circle'); </script>
- Generating or preparing complex visual content, as shown in the following code:
<script type="text/javascript"> // Create a Chord element var chord = d3.layout.chord() .sortSubgroups(d3.descending) .matrix(matrix); </script>
- Loading data using AJAX requests as follows:
<script type="text/javascript"> // Load external data d3.json('data.json', function(error, data){ // do something with the data }); </script>
Note
D3.js is not a chart library! It provides low-level tools to build dynamic visualizations; therefore, many chart libraries are built on top of D3.js.
One reason why D3.js gained a lot of popularity is its data-driven approach. Instead of explicitly looping over elements in an array and drawing them on the screen, D3.js allows for an implicit declarative representation. In D3.js, we rather think in terms of how the visualization is composed than how each element is arranged in the scene. The second main reason for its popularity is its clear focus on its underlying web standards (HTML, SVG, and CSS). This brings many advantages such as the following:
- Compatibility: D3.js is not abstracting the underlying standards, it's exploiting them. Therefore, developers can use all standard attributes of HTML, SVG, and CSS to compose and style their visualizations rather than learning an abstraction API for the visualization library.
- Debugging: D3.js will not only append all HTML elements and styles to the DOM, but it will also append all SVG elements and their CSS attributes. This makes it possible to simply open the developer tools of the browser and look at the generated and modified elements and attributes. It lets developers use their standard debugging tools and workflows that they are already familiar with. Whoever dealt with debugging of pixel graphics libraries (such as OpenGL, WebGL, Canvas, and so on) knows that good debugging capabilities are a real game changer.
- Performance: D3.js relies on SVG and therefore facilitates optimizing performance of interactions and animations by giving full access to all SVG features. In most other graphical libraries, one is limited to the capabilities provided by the abstraction layer and the API of the library.
The killer feature – data joins
There is one more feature that distinguishes D3.js from other DOM transforming libraries such as jQuery: the concept of data joins. When binding an array of data, D3.js automatically intersects the old dataset with the new one to generate three new datasets:
- The enter set that stores all elements from the new dataset that are not in the old dataset and therefore need to be added
- The update set that stores all elements from the new dataset that are already in the old dataset and therefore need to be updated
- The exit set that stores all the elements from the old dataset that are not in the new dataset and therefore need to be removed
The following figure visualizes this intersection, where the old dataset is called Selection and the new dataset is called Data:
This technique is often referred to as data binding because we are literally binding an array of elements to a Selection of elements. However, now we know that data joins are not just data bindings, but they additionally intersect the datasets.
Let's look at a simple example. In general, the data-driven approach of D3.js allows developers to declare the manipulations of HTML or SVG elements based on CSS selectors. This is very similar to jQuery; therefore, I will also show the corresponding code using jQuery:
<script type="text/javascript"> // with jQuery $('p').css('background-color', 'red'); // with D3.js d3.selectAll('p').style('background-color', 'red'); </script>
However, the big difference is that D3.js implements data joins, which gives developers the access to match an array of elements (the new dataset) to a Selection (the old dataset). Corresponding with the enter, update, and exit sets from the previous intersection figure, D3.js can return these intersected datasets using the following functions:
selection.data(dataSet).enter()
for elements that are new to the dataset and not yet in the current Selectionselection.data(dataSet)
for elements that are already existent in the datasetselection.data(dataSet).exit()
for elements that are removed from the dataset and still existent in the current Selection
Let's look at an example where we use all of the preceding methods. First, we will write a function that appends, updates, and removes p
elements in the DOM. Then, we will play around with it:
<script type="text/javascript"> function join_p(dataSet) { var el = d3.select('body'); var join = el // get the selection of all p elements .selectAll('p') // join the selection with the dataset .data(dataSet); // elements not yet in the selection // they need to be added join.enter().append('p'); // elements currently in the selection // they need to be updated join.text(function(d) { return d; }); // elements still in selection // they need to be removed join.exit().remove('p');} </script>
Let's play with this function in the developer tools of the browser. At first, we see a blank page without any p
elements in the DOM. Okay, now we call the join_p(['append', 'to', 'DOM'])
function from the console inside the browser.
We observe that three paragraphs appear with the content append
, to,
and DOM
; we can also look at the DOM tree in the developer tools:
<body> <p>append</p><p>to</p><p>DOM</p> </body>
So what happened here? In the join_p()
function, we first created a Selection of all p
elements in the body using .selectAll('p')
and then created a data join with the ['append', 'to', 'DOM']
dataset using .data(dataSet)
. It seems weird that we call .selectAll('p')
where not a single p
element exists yet in the DOM. However, if we think in terms of data joins, we solely create an empty Selection of p
elements. This makes sense immediately after calling the enter
function, which returns all elements that are not yet existing in this Selection. In our case of the empty Selection, this function returns all the elements of the dataset. Finally, we just need to append them to the DOM using the .append('p')
function.
In the following line, the join
variable returns all elements of the current Selection and we just appended three new elements to it. The .text()
method updates all elements of the current Selection and sets the value of the array element as text of the corresponding p
tag (this method is called dynamic properties and will be explained in more detail in the following chapter). The last method, .exit()
, returns no elements because all elements are available in the dataset and in the Selection. The following figure shows how the Selection changes with the dataset:
If we now call the join_p()
function again, this time with the following dataset join_p(['modify', 'in', 'DOM'])
, we see that the text of the first two paragraphs will change as follows:
<body> <p>modify</p><p>in</p><p>DOM</p> </body>
Despite the previous function call, the Selection of p
elements now is not empty, but contains the three previous elements. This means that both .enter()
and .exit()
methods will return no elements. The join
variable solely contains the new updated elements whose paragraph text is correspondingly updated. We can see the effect on the Selection in the following figure:
Finally, we can try to call join_p([])
with an empty dataset. As we could imagine by now, this results in all paragraphs being removed. The .exit()
function will return all elements of the Selection because the dataset contains no elements. Calling .remove()
on these elements will remove them from the DOM. We can observe the change of the Selection in the following figure:
Note
Data joins are data bindings with access to the intersection of the dataset and the Selection.
The concept of data joins enable the developer to append new data to a graphic when new data is available, to update existing data and to remove data from the graphic when it is not available anymore. Instead of redrawing the complete image, the elements of the graphic are transformed.
Finding resources
Michael Bostock provides an extensive source of detailed information on D3.js, helpful posts, and lots of examples. Once you are stuck or need to find particular information on specific topics or examples, I recommend you to read through the following links:
- Michael Bostock's web page at http://bost.ocks.org/mike/
- Infinite amount of examples and demos at http://bl.ocks.org/mbostock
- Stack Overflow questions at http://stackoverflow.com/questions/tagged/d3.js
If you Google D3.js, you will find a lot of additional resources; however, most of them are just dealing with the basics. To get a good and deeper understanding of D3.js, I would rather advise you to look up the relevant chapters in the book Mastering D3.js, Pablo Navarro Castillo, Packt Publishing, or look directly into the source code of D3.js on GitHub.
D3.js meets AngularJS
AngularJS is a JavaScript framework that modernizes development of web applications in multiple ways; it introduces client-side templates, MVC/MVVM pattern, scoping, two-way data binding, dependency injection, and so on. Therefore, it's our JavaScript application framework of choice. At this point, I assume that you are already familiar with the main concepts of AngularJS and you know when and how to apply them. If there are still problems, I would recommend you to read the relevant chapters in the book Mastering Web Application Development with AngularJS by Pawel Kozlowski and Peter Bacon Darwin, published by Packt Publishing.
Theoretically, we can simply add a D3.js visualization library to the same application that also uses AngularJS without caring about modules, isolation, dependency injection, and so on without any extra effort.
However, once we know how awesome AngularJS is, we want to fully exploit every single advantage of this framework. Having said that, we want every component of the application being injectable, maintainable, and testable. We want to extend HTML syntax and add custom directives to templates. We want proper scope isolation. We want to put common tasks into reusable services. We want to use dependency injection on every single component of the application. We want to integrate D3.js into an application the Angular way.
Testable and maintainable components
AngularJS strongly focuses on testability and maintainability of the components of an application. Once we use plain D3.js to modify the DOM in order to load data and create graphical content, it will become very complex and uncomfortable to test single components or the whole application. We will use the full power of AngularJS, the concepts of dependency injection, modularization, isolation, and directives to create testable components.
Custom directives
AngularJS lets you develop your own directives that extend the HTML syntax to create reusable components for HTML. This is exactly what we want: a reusable component for each different type of visualization that we are going to build. We aim to declare the different elements of a visualization like in the following example:
<html> <head> <script> ... app.directive('d3Map', function(){ ... }); app.directive('d3LineChart', function(){ ... }); app.directive('d3ScatterPlot', function(){ ... }); app.directive('d3ChordDiagram', function(){ ... }); </script> </head> <body> <d3-map></d3-map> <d3-line-chart data="data"></d3-line-chart> <d3-scatter-plot data="data"></d3-scatter-plot> <d3-chord-diagram data="data"></d3-chord-diagram> </body> </html>
We can immediately see that this is a very clean and elegant way to embed your visualization components in the HTML document.
Custom filters
AngularJS introduces filters in frontend templates that allow you to modify variables and filter arrays directly inside the template. For our visualization component, we want to create custom filters (for example, to clamp the dataset to a specific range) that can be applied to all graphics at once. Additionally, we want these filters to be autoupdated whenever data is selected in one graphic as follows:
<html> <head> <script> ... app.filter('startDate', function(){ ... }); </script> </head> <body> <d3-line-chart data="timeData | startDate:'01.01.2015'"></d3-line-chart> <d3-scatter-plot data="timeData | startDate:'01.01.2015'"></d3-scatter-plot> </body> </html>
Custom loading and parsing service
AngularJS emphasizes the concepts of services to implement common functionalities. We want to implement a data loading and parsing service that uses AngularJS' Promises and the capabilities of D3.js parsing functions at the same time. The service should be used like this:
<script type="text/javascript"> app.controller('MainCtrl', ['$scope', 'myService', function($scope, myService) { myService.get('data.json').then(function(data){ scope.data = data; }); } }]); </script>