Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Apache Superset Quick Start Guide
Apache Superset Quick Start Guide

Apache Superset Quick Start Guide: Develop interactive visualizations by creating user-friendly dashboards

eBook
$17.99 $25.99
Paperback
$32.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Table of content icon View table of contents Preview book icon Preview Book

Apache Superset Quick Start Guide

Configuring Superset and Using SQL Lab

Superset has a flexible software architecture. This means that a Superset setup can be made for many different production environment needs. The production environment at Airbnb runs Superset inside Kubernetes and serves 600+ daily users, rendering over 100,000 charts every day.

At the same time, Superset can be set up with default settings for most users. When launching our first dashboard on a Google Compute Instance, we did not have to make any changes to the default parameters.

In this chapter, we will learn about the following:

  • Setting the web server
  • Metadata database
  • Web server
  • Setting up an NGINX reverse proxy
  • Setting up HTTPS or SSL certification
  • Flask-AppBuilder permissions
  • Securing session data
  • Caching queries
  • Mapbox access token
  • Long-running queries
  • Upgrading Superset
  • Main configuration file
  • SQL Lab
...

Setting the web server

Start the Superset web server with this command:

superset runserver

Superset loads the configuration from a superset_config.py Python file. This file must be present in the path stored in the SUPERSET_CONFIG_PATH environment variable. The configuration variables present in this config file will override their default values. Superset uses the default values for variables not defined in the file.

So to configure the application, we need to create a Python file. After creating the Python file, we need to update SUPERSET_CONFIG_PATH to include the file path.

On your GCE instance, run the following commands:

shashank@superset:~$ touch $HOME/.superset/superset_config.py
shashank@superset:~$ echo 'export SUPERSET_CONFIG_PATH=$HOME/.superset/superset_config.py' >> ~/.bash_profile
shashank@superset:~$ source ~/.bash_profile

Those are the last commands...

Creating the metadata database

The SQLALCHEMY_DATABASE_URI variable value is picked up by the Flask-AppBuilder manager to create the metadata database for the web app. The metadata database is persisted in ~/.superset/superset.db by default. This can be verified by running sqlite3 in the directory and listing the tables in the database:

shashank@superset:~/.superset$ sqlite3 
SQLite version 3.16.2 2017-01-06 16:32:41Enter ".help" for usage hints.Connected to a transient in-memory database.Use ".open FILENAME" to reopen on a persistent database.
sqlite> .open superset.db
sqlite> .tables
ab_permission annotation_layer logs ab_permission_view clusters metrics ab_permission_view_role columns query ab_register_user css_templates saved_query ab_role dashboard_slices slice_user ab_user dashboard_user slices ab_user_role dashboards sql_metrics ab_view_menu datasources...

Migrating data from SQLite to PostgreSQL

Before we move forward, let's make sure all tables have been migrated from the SQLite database to the newly set up PostgreSQL database.

First, we need to migrate the SQLite metadata database to our new PostgreSQL installation. We will use sequel, an open-source database toolkit available as a Ruby gem. It works very well with migration tasks from sqlite3 to PostgreSQL, which is why we are using it.

We will install OS dependencies and gem dependencies along with the sequel Ruby gem:

sudo apt-get install ruby-dev libpq-dev libsqlite3-dev
sudo gem install pg sqlite3
sudo gem install sequel

After installing sequel, the migration is as simple as running the following command. Make sure the path to the sqlite3 database is set correctly:

sequel -C sqlite:///home/shashank/.superset/superset.db postgresql://superset:superset@localhost/superset...

Web server

We can integrate Superset with many web server options, such as Gunicorn, NGINX, and Apache HTTP, depending on our runtime requirements.

Web servers handle HTTP or HTTPS requests. A Superset web server typically processes a large number of such requests to render charts. Each request generates an I/O-bound database query in Superset. This query is not CPU-bound because the query execution happens at the database level and the result is returned to Superset by the database query execution engine. Requests to a Superset web server almost always require a dynamic output and not a static resource as a response. Gunicorn is a Python WSGI HTTP server. WSGI is a Python application interface based on the Python Enhancement Proposal (PEP) 333 standard. It specifies how Python applications interface with a web server. Gunicorn is the recommended web server for deploying a Superset...

Setting up an NGINX reverse proxy

We are going to set up NGINX as a proxy server that will retrieve resources on behalf of a client from the Gunicorn web server. NGINX has many functionalities and it is the most popular proxy server in use. We will use it primarily to redirect connections when someone enters a registered web domain name in their web browser, or the external IP address directly into our Superset web server.

We will set up SSL certification for the NGINX proxy server. This way, web connections to our web app will always be encrypted and secure. More popular browsers, such as Chrome and Firefox, will show a warning if the web page does not have an SSL certificate. No worries, we will get the certificate!

We will first install NGINX in our GCE instance. GCE runs an Ubuntu OS:

# Install
sudo apt-get update
sudo apt-get install nginx 

The NGINX service is now installed...

Setting up HTTPS or SSL certification

We will be using Let's Encrypt (https://letsencrypt.org/) a free, automated, and open certificate authority managed by the non-profit Internet Security Research Group (ISRG).

Secure Socket Layer (SSL) is a secure transport layer that can be used in any protocol; HTTPS is a common instance of it, that we will be implementing for our Superset web server.

Just like most other things, configuring SSL has OS level dependencies. First, we will install certbot, which is the free automated certificate service. It needs to verify our site first. It does this by doing some checks (which it calls challenges) in http://<url>/.well_known:

# Install certbot
sudo add-apt-repository ppa:certbot/certbot
sudo apt-get install certbot
# Create .well_known directory
cd /var/www/html
mkdir .well_known

We also need to update the superset.conf file in the...

Flask-AppBuilder permissions

Superset uses the Flask-AppBuilder framework to store metadata required for permissions in Superset. Every time a Flask-AppBuilder app is initialized, permissions and views are automatically created for the Admin role. When multiple concurrent workers are started by Gunicorn, they might lead to contention and race conditions between the workers trying to write to one metadata database table.

The automatic updating of permissions in the metadata database can be disabled by setting the value of the SUPERSET_UPDATE_PERMS environment variable to zero. It is one or enabled by default:

export SUPERSET_UPDATE_PERMS=1 superset init
# Make sure superset init is called before Superset starts with a new metadata database
export SUPERSET_UPDATE_PERMS=0 gunicorn -w 10 … superset:app

Securing session data

Session data that is exchanged between the Superset web server and a browser client or internet bot can be encrypted using the SECRET_KEY parameter value present in the superset_config.py file. It uses a cryptographic one-way hashing algorithm for encryption. Since the secret is never included with data the web server sends to a browser client or internet bot, neither can tamper with session data and hope to decrypt it.

Just set its value to a random string of length greater than ten:

SECRET_KEY = 'AdLcixY34P' # random string

Caching queries

Superset uses Flask-Cache for cache management and Flask-Cache provides support for many backend implementations that fit different use cases.

Redis is the recommended cache backend for Superset. But if you do not expect many users to use your Superset installation, then FileSystemCache is a good alternative to a Redis server.

The following are some of the cache implementations that are available, with a description and their configuration variables:

CACHE_TYPE
Description and configuration
simple
Uses a local Python dictionary to store results. This is not really safe when using multiple workers on the web server.
filesystem

Uses the filesystem to store cached values. The CACHE_DIR variable is the directory path used by FileSystemCache.

memcached

Uses a memcached server to store values. Requires the pylibmc Python package installed in the...

Mapbox access token

The MAPBOX_API_KEY variable needs to be defined because we will use Mapbox visualizations in Superset charts. We need to get a Mapbox access token using the guidelines available here: https://www.mapbox.com/help/how-access-tokens-work/.

After you have obtained it, set the MAPBOX_API_KEY variable to the valid access token value.

Long-running queries

Database queries that are initiated by Superset to render charts must complete within the lifetime of HTTP/HTTPS requests. Some long-running database queries can cause a request timeout if they exceed the maximum duration of a request. But it is possible to configure Superset to handle long-running queries properly using a Celery distributed queue, and transfer the responsibility of query handling to Celery workers.

In large databases, it is common to run queries that run for minutes and hours while most commonly web request timeouts are within 30-60 seconds. Therefore, it is necessary that we configure this asynchronous query execution backend for Superset.

We need to ensure that the worker and the Superset server both have the same values for common configuration variables.

Redis is the recommended message queue for submitting new queries to Celery workers...

Main configuration file

So, we have completed configuring Superset. Let's take a look at the complete Superset configuration file:

# Superset Configuration file
# add file superset_config.py to PYTHONPATH for usage

# Metadata database
SQLALCHEMY_DATABASE_URI = "postgresql+psycopg2://superset:superset@localhost/superset"

# Securing Session data
SECRET_KEY = 'AdLcixY34P' # random string

# Caching Queries
CACHE_CONFIG = {
# Specify the cache type

'CACHE_TYPE': 'redis',
'CACHE_REDIS_URL': 'redis://localhost:6379/0',
# The key prefix for the cache values stored on the server
'CACHE_KEY_PREFIX': 'superset_results'
}

# Set this API key to enable Mapbox visualizations
MAPBOX_API_KEY = os.environ.get('MAPBOX_API_KEY', 'mapbox-api-key')

# Long running query handling using Celery workers
class
...

SQL Lab

SQL Lab is a powerful SQL IDE inside Superset. It works with any database that has a SQLAlchemy Python connector. It is great for data exploration. It can query any data sources in the Superset, including the metadata database.

It is a solid playground from which we can slice and dice the dataset in many ways to arrive at a form that needs to be visualized to solve the analytical question that the chart was created to answer.

First, we need to enable SQL Lab use on the superset-bigquery data source. We will explore and visualize the data in the table using SQL queries.

After clicking on the Sources | Databases option on the navigation bar, select the Edit record option for the superset-bigquery data source:

The overview chart of the list of databases

Then, make sure the following three options are enabled. Allow Run Sync should be enabled by default. We are doing this...

Summary

We understood that when the Superset web server is started we can configure it for our runtime environment needs using the superset_config.py file. We looked at the configuration parameters that can make Superset secure and scalable to match optimal trade-offs.

SQL Lab provides an opportunity to experiment with result sets before plotting. It can be used as an excellent tool for exploring datasets and developing charts.

In this chapter, we replaced SQLite metadata with a PostgreSQL database and configured a web app to use it as the database. So that the web app can handle many concurrent users, we deployed it on a Gunicorn server:

  • PostgreSQL metadata database
  • Gunicorn
  • NGINX
  • HTTPS authorization
  • Securing session data
  • Redis caching system
  • Celery for long-running queries
  • Mapbox access token

Nicely done! We have been able to make dashboards, use SQL Lab, and understand the...

Left arrow icon Right arrow icon

Key benefits

  • Work with Apache Superset's rich set of data visualizations
  • Create interactive dashboards and data storytelling
  • Easily explore data

Description

Apache Superset is a modern, open source, enterprise-ready business intelligence (BI) web application. With the help of this book, you will see how Superset integrates with popular databases like Postgres, Google BigQuery, Snowflake, and MySQL. You will learn to create real time data visualizations and dashboards on modern web browsers for your organization using Superset. First, we look at the fundamentals of Superset, and then get it up and running. You'll go through the requisite installation, configuration, and deployment. Then, we will discuss different columnar data types, analytics, and the visualizations available. You'll also see the security tools available to the administrator to keep your data safe. You will learn how to visualize relationships as graphs instead of coordinates on plain orthogonal axes. This will help you when you upload your own entity relationship dataset and analyze the dataset in new, different ways. You will also see how to analyze geographical regions by working with location data. Finally, we cover a set of tutorials on dashboard designs frequently used by analysts, business intelligence professionals, and developers.

Who is this book for?

This book is for data analysts, BI professionals, and developers who want to learn Apache Superset. If you want to create interactive dashboards from SQL databases, this book is what you need. Working knowledge of Python will be an advantage but not necessary to understand this book.

What you will learn

  • Get to grips with the fundamentals of data exploration using Superset
  • Set up a working instance of Superset on cloud services like Google Compute Engine
  • Integrate Superset with SQL databases
  • Build dashboards with Superset
  • Calculate statistics in Superset for numerical, categorical, or text data
  • Understand visualization techniques, filtering, and grouping by aggregation
  • Manage user roles and permissions in Superset
  • Work with SQL Lab

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Dec 19, 2018
Length: 188 pages
Edition : 1st
Language : English
ISBN-13 : 9781788999564
Vendor :
Apache
Category :
Languages :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want

Product Details

Publication date : Dec 19, 2018
Length: 188 pages
Edition : 1st
Language : English
ISBN-13 : 9781788999564
Vendor :
Apache
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 98.97
Apache Ignite Quick Start Guide
$32.99
Apache Superset Quick Start Guide
$32.99
Artificial Intelligence and Machine Learning Fundamentals
$32.99
Total $ 98.97 Stars icon

Table of Contents

9 Chapters
Getting Started with Data Exploration Chevron down icon Chevron up icon
Configuring Superset and Using SQL Lab Chevron down icon Chevron up icon
User Authentication and Permissions Chevron down icon Chevron up icon
Visualizing Data in a Column Chevron down icon Chevron up icon
Comparing Feature Values Chevron down icon Chevron up icon
Drawing Connections between Entity Columns Chevron down icon Chevron up icon
Mapping Data That Has Location Information Chevron down icon Chevron up icon
Building Dashboards Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.5
(2 Ratings)
5 star 50%
4 star 0%
3 star 0%
2 star 50%
1 star 0%
Cliente Kindle Mar 20, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Excelente livro. Cobertura total sobre o Apache Superset (install, config., etc.).De bônus, um excelente overview sobre análise de dados, visualização/insigths.
Amazon Verified review Amazon
Beth Jan 11, 2022
Full star icon Full star icon Empty star icon Empty star icon Empty star icon 2
The basics of using superset to make visualization and do day-to-day maintenance within the application are good. But in all honesty, those things are easier to figure out than how to take this open source code and deploy it in a sustainable manageable manner. This book, like ALL online tutorials tells you how to manually use the command line to simply install it and get it running. All management of the servers and load balancing would basically be manual from the command line.That serves no purpose in this day and age when ever server service offers manage servers which scale up and down, offer redundancy and safeguards for data. There is litteraly NO WHERE online where someone shows you how to build a docker compose file to automate CloudFormation with appropriate configuration. Docker compose can even do it for you, but you still need to know how to create that docker compose document and use the CLI. I was hoping this was that book. There are just too many resources for what is in here (albeit strewn in little tutorials and videos) all over the net.Well written and useful as a software USER but totally useless if you have been asked to build a server for your team...
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.