Working with business intelligence tools
Over the course of the last 20 years, there have been a growing number of software products released that were geared towards business intelligence. In addition, there have been a number of software products and programming languages that were not initially built for BI but later on became a staple for the industry. The tools used in this book were chosen based on the fact that they were either built on open source technology or products from companies that provided free versions of their software for development purposes. Many big enterprise firms have their own BI tools and they are quite popular. However, unless you have a license with them, it is unlikely that you will be able to use their tool without having to shell out a small fortune. The tools that we will cover in this book will fall under one of these two general categories:
Traditional programming languages such as R, Python, and D3.js (JavaScript)
Data discovery desktop applications such as Tableau, Qlik, and Power BI
Power BI and Excel
Power BI is one of the relatively newer BI tools from Microsoft. It is known as a self-service solution and integrates seamlessly with other data sources such as Microsoft Excel and Microsoft SQL Server. Our primary purpose in using Power BI will be to generate interactive dashboards, reports, and datasets for users.
In addition to using Power BI, we will also focus on utilizing Microsoft Excel to assist with some data analysis and the validation of results pulled from our data warehouse. Pivot tables are very popular within MS Excel and will be used to validate aggregations done inside the data warehouse.
D3.js
D3.js, also known as data-driven documents, is a JavaScript library known for its delivery of beautiful visualizations by manipulating documents based on data. Since D3 is rooted in JavaScript, all visualizations make a seamless transition to the Web. D3 allows major customization to any part of a visualization, and because of this flexibility it will require a steeper learning curve that probably any other software program discussed in this book. D3 can consume data easily as a .json
or a .csv
file. Additionally, the data can be imbedded directly within the JavaScript code that renders the visualization on the Web.
R
R is a free and open source statistical programming language that produces beautiful graphics. The R language has been widely used among the statistical community, and more recently in the data science and machine learning community as well. Due to this fact, it has gathered momentum in recent years as a platform for displaying and delivering effective and practical BI. In addition to visualizing BI, R has the ability to visualize predictive analyses with algorithms and forecasts. While R is a bit raw in its interface, some IDEs (Integrated Development Environments) have been developed to ease the user experience. For the purposes of this book, RStudio will be used to deliver visualizations developed within R.
Python
Python is considered the most traditional programming language of all the different languages that will be covered in this book. It is a widely used in general-purpose programming language with several modules that are very powerful in analyzing and visualizing data. Similar to R, Python is a bit raw in its own form for delivering beautiful graphics as a BI tool; however, with the incorporation of an IDE, the user interface becomes a much more pleasurable development experience. PyCharm will be the IDE used to develop BI with Python. PyCharm is free to use and allows the creation of the IPython (now called Jupyter) notebook, which delivers seamless integration between Python and the powerful modules that will assist with BI.
Tip
As a note, for the purposes of this book all code in Python will be developed using the Python 3 syntax.
Qlik
Qlik is a software company specializing in delivering business intelligence solutions using their desktop tool. Qlik is one of the leaders in delivering quick visualizations based on data and queries through their desktop application. They advertise themselves as a self-service BI for business users. While they do offer solutions that target more enterprise organizations, they also offer a free version of their tool for personal use. It is this version that will be discussed in this book. Tableau is probably the closest competitor to Qlik in terms of delivering similar BI solutions.
Tableau
Tableau is a software company specializing in delivering business intelligence solutions using their desktop tool. If this sounds familiar to Qlik, it's probably because it's true. Both are leaders in the field of establishing a delivery mechanism with easy installation, setup, and connectivity to the available data. Tableau has a free version of their desktop tool, which will be primarily used in discussions in this book. Again, Tableau excels at delivering both quick, beautiful visualizations as well as self-service data discovery to more advanced business users.
Microsoft SQL Server
Microsoft SQL Server 2014 will serve as the data warehouse for the examples that we will use with the BI Tools discussed previously in this book. Microsoft SQL Server is relatively simple to install and set up, and it is free to download. Additionally, there are example databases that configure seamlessly with it, such as the AdventureWorks
database.