Search icon CANCEL
Subscription
0
Cart icon
Cart
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
€8.99 | ALL EBOOKS & VIDEOS
Save more on purchases! Buy 2 and save 10%, Buy 3 and save 15%, Buy 5 and save 20%
Tableau Certified Data Analyst Certification Guide
Tableau Certified Data Analyst Certification Guide

Tableau Certified Data Analyst Certification Guide: Ace the Tableau Data Analyst certification exam with expert guidance and practice material

By Harry Cooney , Daisy Jones
€29.99 €8.99
Book Jun 2024 462 pages 1st Edition
eBook
€29.99 €8.99
Print
€37.99
Subscription
€14.99 Monthly
eBook
€29.99 €8.99
Print
€37.99
Subscription
€14.99 Monthly

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now
Table of content icon View table of contents Preview book icon Preview Book

Tableau Certified Data Analyst Certification Guide

Introduction to Data

You are likely to have a basic grasp of the concept of data – it is difficult to avoid in a world increasingly driven by information, in which this information is both easier to gather and more valuable than ever. However, it is useful to begin with a basic definition.

Take, for example, a coffee chain store. Every day for the past five years, there has been an abundance of data collected for the number of sales, the number of cups of coffee sold, the different types of coffee sold, and even the stocks of coffee. A regional store manager may want to be able to see how well their sales are doing, which stores are performing best, and whether their specialty drinks are selling better than the classics. Five years of data can be hard to analyze in spreadsheets, and this is where Tableau can be the ideal tool for use. Tableau can connect to the data source and build dynamic dashboards that can help answer these questions. It can even define any outliers that exist, allowing the coffee chain to make data-driven decisions to help improve its business.

The following topics will be discussed in this chapter:

  • Connecting to sources
  • Data structure
  • Choosing an appropriate data source type
  • Connecting to Tableau Server or Tableau Cloud
  • Connection management

This chapter is designed to inform you how to connect to data in Tableau with different data sources and servers that are available.

Making the Most Out of This Book – Your Certification and Beyond

This book and its accompanying online resources are designed to be a complete preparation tool for your Tableau Certified Data Analyst Certification Guide.

The book is written in a way that you can apply everything you’ve learned here even after your certification. The online practice resources that come with this book (Figure 1.1) are designed to improve your test-taking skills. They are loaded with timed mock exams, interactive flashcards, and exam tips to help you work on your exam readiness from now till your test day.

Before You Proceed

To learn how to access these resources, head over to Chapter 9, Accessing the Online Practice Resources, at the end of the book.

Figure 1.1: Dashboard interface of the online practice resources

Figure 1.1: Dashboard interface of the online practice resources

Here are some tips on how to make the most out of this book so that you can clear your certification and retain your knowledge beyond your exam:

  1. Read each section thoroughly.
  2. Make ample notes: You can use your favorite online note-taking tool or use a physical notebook. The free online resources also give you access to an online version of this book. Click the BACK TO THE BOOK link from the Dashboard to access the book in Packt Reader. You can highlight specific sections of the book there.
  3. Chapter Review Questions: At the end of this chapter, you’ll find a link to review questions for this chapter. These are designed to test your knowledge of the chapter. Aim to score at least 75% before moving on to the next chapter. You’ll find detailed instructions on how to make the most of these questions at the end of this chapter in the Exam Readiness Drill - Chapter Review Questions section. That way, you’re improving your exam-taking skills after each chapter, rather than at the end.
  4. Flashcards: After you’ve gone through the book and scored 75% more in each of the chapter review questions, start reviewing the online flashcards. They will help you memorize key concepts.
  5. Mock Exams: Solve the mock exams that come with the book till your exam day. If you get some answers wrong, go back to the book and revisit the concepts you’re weak in.
  6. Exam Tips: Review these from time to time to improve your exam readiness even further.

This chapter covers the following main topics:

  • The benefits of cloud computing
  • Cloud deployment models
  • Cloud service models
  • The core concepts of Azure

Connecting to Sources

The first step in virtually any data analytics process is to connect to data. Tableau (and machines in general) cannot natively know where, how, or when to look for data. This link must be initially set up by the user.

Tableau offers many options for data connectivity, from simple spreadsheets on a local device to large online repositories accessible anywhere in the world. Naturally, with such varied means of storing data, the process for setting up these connections differs. Nonetheless, the location must always be specified in some way.

Data often contains sensitive information and is relied upon to represent real events. It is, therefore, important to maintain its integrity. Some kind of authentication is often required to prove user identity, especially for data stored on the cloud and intended to be accessed remotely; this most commonly involves entering credentials (such as a username or password).

It is important to note that none of the services in the Tableau Suite (such as Desktop, Prep, or Server) have the tools or permissions to change the underlying source data. Tableau can only edit duplications of the data contained within its own environment.

Organizations with a base level of data maturity, typically medium to large companies, often have established data sources and servers, such as Teradata, Snowflake, or even Tableau Server, which Tableau users are expected to utilize. However, it is certainly possible for data analysts and those in similar roles to find themselves establishing new sources. In any scenario, there are requirements and best practices for data that all users should be aware of.

Data Structure

Tableau is optimized for use with data in a tabular (table-based) structure. This is a structure that you may be familiar with through working with Microsoft Excel or similar software. Vertical columns store values for whatever the column (or field) represents, such as an item description or order date; horizontally, each row of values collectively forms a record (which can be thought of as an observation).

For example, a company may record transactions across its stores in a table such as Table 1.1 (sampled to the first transaction – 000001 – in store number 677):

Store Code

Transaction ID

Item Code

Quantity

Purchase DateTime

677

000001

0000145-GRY

10

2024-01-03 09:15:32

677

000001

0000096-AAA

5

2024-01-03 09:16:01

677

000001

0000452-BLU

2

2024-01-03 09:16:23

Table 1.1: Example of a tabular structure

As each record represents an item bought, it is apparent that three distinct products were bought as part of this single transaction. Each key data point relating to these records (such as quantity, purchase date, or time) is stored neatly in a distinct column.

Relational databases such as Microsoft SQL Server and many common file types have this tabular structure as a default, or at least a simple alternative structure (such as comma-separated values) that Tableau can quickly convert into a table as it loads the data.

Choosing an Appropriate Data Source Type

Before any user or developer can start building a visualization, an appropriate data source must be defined. Without data, there will be no visuals to be built and no stories to be told. Multiple factors should inform decision-making when it comes to choosing a data source. In summary, these include the following.

Content and Quality

The following points are key for any data preparation for a developer to be able to analyze data properly. It is important for you to familiarize yourselves with these practices as you will be questioned about them in the exam.

Level of detail: Dimensions and Measures

Tableau should be approached not merely as a tool for data visualization. Charts should be created and used thoughtfully as a means of answering business questions. Therefore, data should be selected with a goal in mind: does it contain the fields (columns) and records (rows) required to answer the questions at hand?

When it comes to fields, data should contain the appropriate dimensions (to divide the view) and measures (for assessable metrics). It is impossible to review the relative performance of each salesperson, for example, without their name or other unique identifiers alongside a Profit field. And if those fields are incomplete – lacking all salespeople, or profits for certain months of the year – then accurate conclusions cannot be drawn.

Data Quality

The previous point touched on completeness as an important facet. This can be expanded further: any data source used should have an appropriate level of completeness, accuracy, and consistency for the resulting insights to be valuable. You need to make sure that the data used is complete, all field names are named appropriately, and the spellings are kept consistent. Please see Chapter 2, Transforming Data, for further details.

Technical Requirements

This section will look into the technical requirements that need to be considered when building a report. The purpose is to inform you about the different data connections that can be made and their performance.

Performance: Data Size and Structure

Tableau is capable of processing large volumes of data, but performance sits on a curve: the larger the dataset, the more computational power is required to access and process it. There is no fixed rule for when performance will meaningfully decrease, as this depends on a complex combination of factors, including the specification of the machine running the query (one with lots of resources, such as RAM, can handle greater quantities of data). It is fair to say that a data source with dozens of columns will be processed slower than one with a handful of them; similarly, a source with millions or even billions of records will be less performant than one with a few hundred.

There are stricter limitations for data sources hosted on Tableau Server or Tableau Cloud rather than a local machine; for example, joins and relationships cannot be established, only blends. These are covered in more detail in Chapter 8, Publishing and Managing Content.

It is worth noting that Tableau generally prefers data that is long rather than wide in structure: that is, Tableau can handle more records better than it can handle more fields.

Data Format and Compatibility with Tableau

Users should be sure that a connector exists natively for the given data source type. This can be a type of file that exists locally on the computer such as an Excel file.

Users should consider whether data is accessed live or saved as an extract – that is, whether the data is a saved snapshot, such as an extract, or whether it would run on a real-time basis, such as a live data source.

The description and limitations of these connections will be explained further in this chapter.

Microsoft Excel (.xlsx) and Comma-Separated Values (.csv) Types

This section is written to differentiate between Excel and CSV files and discuss how they are connected to Tableau. Because of their simplicity and familiarity with users, these are the most likely files used to connect to Tableau.

Excel is a specialist spreadsheet format from Microsoft that has been popular for data storage for decades.The comma-separated values (CSV) file is named as such because values within it are distinguished (delimited) by commas. Each line in the file constitutes a distinct record. The CSV format is generic and not associated with a particular software or service, though files are often opened and used with Microsoft Excel or Google Sheets.

The plain-text format of CSVs is simple to create and straightforward for programs to read, making it easier to move data between systems or locations without complicated parsing steps. CSV files appear no different from spreadsheets when opened in Excel or Tableau – almost all software designed for use with data automatically displays the mass of comma-delimited text in a tabular structure of columns and rows.

As both file types are commonplace, there is a low barrier to entry – the formats are familiar to a wide range of computer users. However, as with most files, these types offer a snapshot of data in time; they are not automatically updated in the way that connections to live source systems are.

Note that CSV files typically cast values as plain text, even when all values are in a specific format – fields that are fully numerical, for example, may still be returned as strings. When first opened in Tableau Desktop, fields often need converting to appropriate data types such as String, Integer, or Float, so that there will be no errors when building calculations. Excel files have a hard limit of approximately one million records, determined by the maximum memory available in the Microsoft Excel software itself; note that this also applies to imported CSVs.

Relational Databases

While using Excel and CSV is a simple and easy way to connect to data in Tableau, these files can easily be changed by human error and are not dynamic. Most organizations have outgrown using Excel and CSV for the following reasons:

  • They require data that can be found quickly when needed and is trusted to be reliable and accurate
  • The solutions need to be able to comfortably handle the natural growth of data and the number of people wanting to access and manipulate it
  • The files can often be duplicated and shared freely, risking unwarranted access
  • Alternative solutions offer greater opportunities to connect from other locations, rather than a single local machine

Relational databases are often a reliable means of achieving these benefits. They are data storage systems that organize information in the familiar tabular structure, with rows and columns; when databases are discussed in a Tableau-specific context, users are usually referring to relational databases. Databases are often hosted on a server, which provides the resources required to run and manage the database; servers can often host multiple databases simultaneously, each with a distinct function.

Tables inside these data repositories are usually set up by developers to capture conceptually distinct types of information. For instance, a marketing center may have a Telephone Enquiries table with each record representing an outgoing call (with columns such as start time, duration, and operator), but store customer-level information (such as phone numbers, addresses, and last-contact dates) in a separate table called Clients.

Common elements allow tables to be related to each other for analytical purposes. This is usually done through keys. Primary keys are either a single field or multiple fields in combination that can be used to identify distinct records. To do this effectively, values in the primary key column(s) must be unique for each row, and primary key columns must be fully populated – that is, all records must have a value (with no missing values, known as null values). Tables typically have just one primary key. Primary keys are useful for identifying duplicate values, which reduce the reliability of the data and result in issues such as double counting.

Foreign keys are columns in a table that refer to the primary key in another table. They are used to link tables on a common identifier. To continue the preceding example, the Clients table might have a Client ID column as the primary key, which also appears as a foreign key in the Telephone Enquiries table. Analysts can match the numbers between tables and identify which client was called in each instance. For example, they could identify which clients have had the greatest volume of successful calls and are therefore worth investing in. This process maintains the original values in a single location – the confidential Clients table – to make the data easier to govern.

Relational databases need to be communicated with for records to be accessed, updated, added, or deleted. This is achieved using a programming language called Structured Query Language (SQL). SQL is discussed further later, in the Custom SQL Query section.

Relational databases are popular as they often enforce rules to maintain data consistency and accuracy; for example, rules may be built to only allow values with a certain range when adding new records. In the Clients table, a Telephone Number field may require a 10-digit format with a country code prefix for a new record to be accepted in the table.

Popular relational database management systems include PostgreSQL, MySQL, and Oracle.

Initial Data Connections

There are multiple ways to connect to data when Tableau Desktop is first opened. The initial screen appears as shown in Figure 1.2:

Figure 1.2: Tableau Desktop start page

Figure 1.2: Tableau Desktop start page

Users can connect to data primarily through the blue Connect data pane on the left-hand side of the window.

Some popular connection types are listed here for quick access:

  • Tableau Server (including Tableau Cloud) will be described briefly soon or in more detail in Chapter 8, Publishing and Managing Content.
  • Files on the user’s local machine, or a cloud location, mapped and accessible by the current machine.
  • A server (containing databases) whose available options were discussed later.
  • The Data tab at the top of the window is largely useful when data has already been connected to. At this early stage, the only available option is to allow the pasting of data from the clipboard.

When pasting data onto Tableau from the clipboard, the values are spun into a table with an arbitrary name, as shown in Figure 1.3:

Figure 1.3: Data connector pane

Figure 1.3: Data connector pane

The data is saved automatically by Tableau as a temporary file of the plain-text type in a predetermined location:

Figure 1.4: File directory

Figure 1.4: File directory

Though this option can be useful in certain cases – such as for mocking up quick examples in a teaching environment or troubleshooting data issues – it is not recommended for serious data work. Firstly, it relies on the user to copy data accurately. The temporary file location is also not intuitive to access for maintenance purposes and may be cleared depending on user settings and behavior.

The window that appears when a data source is first connected to is called the Data Source page. It is accessible at any time from the bottom of the Tableau Desktop window, should a data source need checking, adding to, or otherwise modifying while visualizations are being developed.

Figure 1.5: The Data Source tab

Figure 1.5: The Data Source tab

If multiple connections are present in a workbook, the primary one used in the sheet being navigated away from will appear in this window.

Connecting to Tableau Server or Tableau Cloud

Both Tableau Server and Tableau Cloud can be thought of as different flavors of the same online service: they enable organizations to securely host, organize, share, and edit Tableau-specific data sources and visualizations, usually through a web browser. Tableau Server and Cloud will be covered in more detail in Chapter 8, Publishing and Managing Content. In short, Tableau Server is managed independently by an organization with the skills and resources to do so, whereas Tableau Cloud is an out-of-the-box solution hosted and maintained by Tableau, and is therefore more accessible.

Both services can be accessed on Tableau Desktop through the Tableau Server button at the top of the Data pane, or through the Server tab at the top of the window.

Figure 1.6: The Tableau Server Sign In window

Figure 1.6: The Tableau Server Sign In window

Note that if Tableau Server is selected, you will be prompted to enter the URL for a Tableau Server or Cloud instance.

If the URL is recognized, the following window will prompt you to authenticate. Tableau Desktop will open a new tab in the default web browser with a login screen. If a valid username and password are provided, a connection to Tableau Server will be established.

Figure 1.7: Tableau Server login window

Figure 1.7: Tableau Server login window

When connecting to Tableau Server, the user will be prompted to choose a site to connect to. Sites are described fully in Chapter 8, Publishing and Managing Content, but in brief, they can be thought of as a folder; they are the highest level and the strictest way to separate and organize content, as sites cannot communicate with each other.

Figure 1.8: Tableau Server site selection

Figure 1.8: Tableau Server site selection

Note that this screen does not appear for Tableau Cloud instances, as Tableau Cloud only permits a single site on which all content must be stored. An entirely new instance of Tableau Cloud must be spun up if content requires strict separation.

Connecting to a File

Connecting to a file will require a file location on a local machine or a network that the machine can access, such as SharePoint or GDrive. These kinds of files have different ways in which they are brought into Tableau. This section will go into detail on how to connect to each file type.

CSV and Excel Files

When Tableau Desktop is open and is asking to connect to data, it is likely that the developer will choose to use an Excel or CSV file. Note that .csv is a text file, so it is accessible under the Text File button under Connect > To A File.

In this selection, the Data pane will be populated with tables that exist in the files. From this point, the developer will need to identify the main table and then bring it into the main view. Then, the developer can establish the relationships of other tables and create their data model for analysis.

Tableau is intelligent enough to be able to automatically identify the data type of the fields for Excel files, but it is likely that for CSV, the developer will need to assign the data types for the fields.

.hyper or .tde Files

One type of file to connect to is called a .hyper file, the successor to the .tde file that was deprecated in 2023. This kind of file is a Tableau extract that is a local copy of the data that can be shared and has an improved performance.

This extract is very simple to connect to. Select the More...option in the To A File option and locate the extract in the file directory.

These two files are important as they are considered the main type of files to create extracts. This is likely to come up in the exam.

JSON

A JSON file is a formatted text file that stores data in pairs or arrays. To use this kind of file requires the user to select a schema level, in which Tableau then flattens this information for the schema.

The following is an example of connecting to a JSON file and the window that opens that allows the user to select the schema.

Figure 1.9: Example of a JSON file connection

Figure 1.9: Example of a JSON file connection

Once the schema has been selected, the data will be opened for preview and then the user can prepare the data for analysis.

PDF

Tableau will be able to scan a PDF file to allow the user to access the data. When selecting the PDF, the user will be prompted to select what areas Tableau can scan through.

Figure 1.10: Scan PDF File selector

Figure 1.10: Scan PDF File selector

Depending on how the PDF is set up, a user can simply scan the whole file, or find the specific page on which the table is located.

Once the file has been established, the data connection pane will be populated with tables that exist in the sheets. The user can then union the tables together to get a complete set.

Figure 1.11: Data connection pane

Figure 1.11: Data connection pane

Note that there is a chance the data will not be clean and may need some preparation, which can be done in Tableau Prep.

Spatial Files

Spatial files can be used to create maps or plot out geographical areas such as the counties in the United Kingdom, but there are several considerations that need to be taken into account before connecting to one. In the local file directory where the spatial file is saved, it is important to note that each type of spatial file will require supporting files for it to work in Tableau.

For KML, GeoJSON, and TopoJSON files, there only needs to be one file. For the other types, Esri, Esri geodatabases, and MapInfo tables, the folder should include other files to allow the tables to work.

In the London Datastore, there are spatial files of London boroughs that can be downloaded for use.

Once the file has been located and established, there will be a dataset that includes a Geometry column.

Figure 1.12: Example of a spatial file and a Geometry column

Figure 1.12: Example of a spatial file and a Geometry column

Note that the fields use Polygon, which is what created the shape of the border for each London borough. When creating a new sheet, double-clicking or dragging the Geometry field will create a singular shape of the map.

Figure 1.13: Tableau Geometry field showing a map of London boroughs

Figure 1.13: Tableau Geometry field showing a map of London boroughs

Notice that the whole map has been highlighted as one. To be able to get each separate borough, the developer will need to drag in the detail that will pinpoint which space belongs to the borough.

Statistical Files

For the use of statistical files, Tableau can connect to Statistical Analysis System (SAS), Statistical Package for the Social Sciences (SPSS), and R data files. These kinds of files are straightforward to connect to. There are only a few items to note:

  • The statistical file does not accept values as label headers
  • These connections only allow for one table per file
  • If more than one table is present in an R file, then Tableau will take the first one

Now that the types of files have been described, take some time to find examples of these files and connect to them. Explore how the different types connect and what is required to do so. There will be a Tableau data extract and an Excel file available in the Datasource folder located in the Tableau repository folder. The other files can be created or found online.

Connecting to Databases and Cloud Services

So far, this chapter has covered files and relational databases that Tableau can connect to. However, large corporations with complex and abundant amounts of data are likely to use databases and cloud services that hold their data. This section will go through the different types of databases and cloud connectors and discuss how to connect to the data.

Default Server Connectors

Tableau has a vast number of default servers readily available for use. Go to the Connect pane and, under where it says To a Server, hover over More… to see all the types of server connections that can be made.

Figure 1.14: Tableau Server connection options

Figure 1.14: Tableau Server connection options

Servers is a broad term, but in the context of Tableau, largely refers to the host of relational databases (or other technically different but functionally similar services, such as Microsoft OneDrive, which allows business users to share and collaborate on the web).

As visible in Figure 1.14, the list of connectors is comprehensive, even for Tableau version 2022.3. You are not expected to memorize the full list of connectors, and Tableau is continually adding new connectors to this list. However, it is useful to be aware of their general types.

Relational databases (introduced earlier in this chapter) are communicated with and queried using SQL, which is compiled by Tableau behind the scenes. These queries are written to be highly efficient, so queries can return results as quickly as possible.

When a server is selected, the resulting window will ask the user for connection and authentication details. (Though the exam is unlikely to pose questions about this window specifically, it is important to understand what is required to establish a connection that works reliably.)

Figure 1.15: Credentials section to SQL Server

Figure 1.15: Credentials section to SQL Server

Note that for certain services, such as Microsoft OneDrive and SharePoint Online, single sign-on (SSO) may be the authentication type configured by the network administrator. This will be someone who has access to permissions for others in the organization.

When this is the case, users are usually directed to their default browser to sign in with their organizational account.

Figure 1.16: SQL query builder

Figure 1.16: SQL query builder

Initial SQL is an advanced option in Tableau Desktop that developers can use if they wish to recall a specific subset of data (for example, population data for a particular region only). It runs when the workbook is first opened and generates a cache (or temporary table) for the duration of the session. Any changes to the data will not be reflected until the workbook is reopened.

Initial SQL can result in noticeable performance improvements when used alongside custom SQL. See the Custom SQL Connections section later in the chapter.

Once connected, the database tables available to the user will appear on the left-hand side, in the Data pane. At this stage, a connection has been established, but Tableau cannot predict which data the user is interested in. The table(s) of interest must be dragged onto the relationship canvas, as shown here:

Figure 1.17: Connection pane and tables to be connected

Figure 1.17: Connection pane and tables to be connected

To preview the data for the given table in the Results pane, select Update Now:

Figure 1.18: The Update Now button to preview the data

Figure 1.18: The Update Now button to preview the data

Custom SQL Queries

A more technically advanced but powerful option for connecting to database tables is Custom SQL. This allows a developer to specify the exact fields and even filter the data from the source to create an efficient dataset to work with in Tableau. The following section will look into SQL queries and how to connect to them in Tableau.

Figure 1.19: Example of an Employee Lookup Custom SQL

Figure 1.19: Example of an Employee Lookup Custom SQL

Introduction to SQL and Custom Queries in Tableau

SQL (pronounced sequel) is a programming language commonly used to communicate with relational databases. It can be used for a variety of tasks, such as inserting, updating, or deleting records. However, in Tableau, it is largely limited to retrieving records of interest for analysis and reporting purposes.

Though some SQL queries may be long and complex, the keywords used by SQL use natural language, and clauses (pieces of logic) are often structured in a logical manner. Take the following query, which is pulling through the latest 1,000 records from a PostgreSQL database table:

SELECT *                                                           ←  Select all records ('*' is a wildcard character)
FROM "Database"."Schema"."QA_CHECKS"   ←  From the database, schema, table specified
ORDER BY LAST_MODIFIED DESC                ←  Sort records descending by date field given
LIMIT 1000;                                                        ←  Return the first 1,000 records only

In the preceding example, PostgreSQL is specified as a specific database management system. This is because SQL is a generic term; there are multiple different flavors of the language, the same way human languages often have different dialects depending on which region the speaker is from. Each system has its own grammatical structure (known as syntax), which needs to be exact to run correctly. For example, another system may use single quotes (') rather than double quotes (") to identify the Database, schema, and table objects.

Custom SQL is useful in the following cases:

  • The user is only interested in specific subsets of data that cannot be achieved using the interface (no-code) tools.
  • Custom SQL can be helpful if the user wishes to select records or combine data from different tables, using more complex logic that the interface does not permit.
  • The dataset is extremely large either in width (number of columns) or length (number of records).
  • It is best practice to only bring strictly necessary data into Tableau to avoid wasting valuable computing resources (and therefore time). However, Custom SQL can be slow (as we will discuss soon). Therefore, it can be a case of trial and error to discover whether custom SQL can improve performance. Where possible, simpler joins and unions are best set up using the user interface – see Chapter 2, Transforming Data, for further details.
  • The analysis is one-off or exploratory – we will discuss performance concerns shortly.
  • The database itself does not permit certain functionalities using the user interface.
  • There are times when certain operations, such unions or pivots, are not supported by the physical layer.

A major downside to using custom SQL is that it is slow. Queries are almost always slower to run than those automatically generated by Tableau with basic connections that do not use custom SQL. This is aggravated when many complex clauses are used or when you attempt to select large amounts of data – for example, with the SELECT * (select all records) statement. If a custom SQL query is proving valuable, it is almost always worth formalizing as a database table or view that can be connected to directly by the user, especially if Tableau workbooks are intended to be reliable, performant, and for mass consumption (production workbooks).

Custom SQL queries are powerful, but this can cause errors if the data has not been prepped properly for analysis. Custom SQL queries should only be used when the following apply:

  • The user has a clear idea of the data required to write efficient queries that answer the business questions at hand
  • The user understands the structure of the database and is comfortable writing SQL queries, to avoid pulling incorrect, excess, or duplicate records

Live versus Extract Connections

Almost all connections can be configured to be either live or extract. These types of connections can determine performance and how the data is accessed, so making sure to choose the appropriate type is important for any data analyst. This section will look into the two types and their benefits and limitations.

Live Connections

Live data connections are the simplest to conceptualize. Here, the source data is pinged directly with every query, producing an almost real-time accuracy of data. Little needs to be done to maintain live data connections by their direct nature as they should be self-sufficient. There is no need to set up a schedule for refresh.

Note that a live connection does technically use caching technology. Caching is the temporary gathering and storing of information in Tableau for use in future queries.

This means that the fetching of data from a database only triggers if the charts are interacted with in some way; a visualization left open for a long time may need refreshing or clicking on to show the most up-to-date information. Minute delays are also inevitable as the machine sends and receives queries from the external source; it takes time, though often only milliseconds, to complete this process.

Pros of Live Data

Take a look at some advantages of live data connections:

  • Updates are virtually in real time. It can be reassuring for stakeholders to know that the data being visualized is up to date whenever it is accessed, and live connections can be more convenient to use if an organization does not have access to refresh schedules.
  • The data is updated automatically without the user needing to access the source.

Cons of Live Data

It is less performant. This kind of connection can take longer to retrieve data as it would be expected to run every time there is an action performed on the data. This can be detrimental for large companies with huge quantities of data, especially if there are complex joins and unions involved.

Now that live connections have been described, it is time to look into data extracts and how they perform as well as their pros and cons.

Data Extracts

An extract is a duplicate of the source data in a format optimized for Tableau use. The data can be limited to a smaller set, or represent the source data fully and be considered a kind of snapshot in time.

For some of the best practices, it is useful to filter the data to only the data you need, such as a Datasource filter. This creates a smaller sample for the data that will improve efficiency and eliminate unnecessary queries or the querying of irrelevant data.

For example, the user could be asked to generate visualizations to analyze trends in transactional data from a particular store. If the visualization in development is purely concerned with this store independently, and not looking to compare it to others, taking an extract will allow the conditional filtering in the Store field at the data-source level. Now, whenever the visualization queries the data – when initially loading charts, for example – queries will not be wasted on miscellaneous stores.

Users of Tableau Desktop can update any extract they own or have permission to edit. This can be done manually or by setting a schedule.

The initial refresh occurs when publishing an extract for the first time (either directly, by publishing the data source itself, or by publishing a workbook with the data source embedded). The user will then be prompted to save the extract in the directory that can be accessed locally. This option appears once the user has selected the Extract option in the top-right corner of the Datasource page.

Figure 1.20: Selection of the Extract type

Figure 1.20: Selection of the Extract type

Any other manual refreshes will require the original datasource to have changed and then for the developer to refresh the data in the workbook. This can be done by selecting Data > Refresh Data Source or by clicking on the circular Refresh button.

Figure 1.21: Refresh symbol

Figure 1.21: Refresh symbol

There are two types of refresh that can be set when using a data extract:

  • Full: The entire dataset is recalled and the extract is completely regenerated – no records from the previous extract are retained.
  • Incremental: New records only are returned and appended (added to) the existing dataset. The user can specify the field used to identify new records, but note that this must be of the numeric or date type; Tableau will look for records that are greater than the maximum value in that given field.

These refreshes can be set in the configuration window for Extract Data. This box opens when the user selects Edit next to the radio options in the top-right corner of the Datasource page.

Figure 1.22: Window to add extract filters or incremental refresh

Figure 1.22: Window to add extract filters or incremental refresh

The manual refreshing of data in these ways is possible even when the data is running on a schedule. As mentioned previously, data is often analyzed for historical periods and rarely needs to be live to each second. Generally, there is an Extract Refresh schedule suitable for most needs: data can currently be set to refresh each hour, day, week, or month.

As refresh schedules can only be set through Tableau Server or Cloud, data sources must be hosted there to be eligible, either as standalone data sources or contained within specific workbooks.

The scheduling options available to Tableau users are set by Tableau Server administrators, a senior role intended for the management of the server, and may vary from instance to instance. As these processes are more strictly in the realm of Tableau Server/Cloud, they are explored further in Chapter 8, Publishing and Managing Content.

A full extract refresh reflects the source data most accurately at the time of running. However, it can be slow to complete. It can also use a lot of computational resources in doing so. The impact of these depends on multiple factors. For example, large sets of data, with wide (large number of columns) and long (number of rows) tables, can take longer for the data to fetch. It could also depend on whether the database is optimally structured for Tableau queries; the details of this are out of this exam’s scope.

An advantage of incremental refresh is that the field used to identify unique records can be specified by the user. However, the logic for identifying new records is limited to a single field only. Tableau cannot consider the values of multiple fields in combination. It is also restricted to a field of either the numeric, date, or datetime type or a simple greater-than logic.

Tableau does not permit custom logic for returning records; it automatically returns records with values greater than the maximum value in the current extract. For example, if a date field is chosen and the latest date is 2023-12-01, any record found at the source with dates beyond this value (such as 2024-01-01) is appended. Numeric types work the same way. This may be an issue if, for example, a record identifier field is not stored as a numeric field (and therefore cannot be selected), or if new records are not given consecutive numbers.

The filtering mentioned here is just one type of filtering possible in Tableau (at the highest level: limiting the data itself). For further filtering options, see Chapter 4, Grouping and Filtering.

Other Factors Affecting Performance

Here are some general factors that may impact performance. These factors are important to keep in mind for any developer so that the performance of any future workbooks that are being built can be improved:

  • A developer will need to be aware of the location of the files – whether they are on a local computer or a network location. A network location could slow the connection and also be at risk as many people may have access to the source and could even accidentally delete it.
  • If there are multiple tables and complex joins, this could impact the performance of the workbook.
  • Tableau works best with tables that are long, which means fewer columns and more rows. If a table is wide, this can impact performance.

Connection Management: the Datasource Pane and Data Tab

The Datasource pane is important for a user to be able to create reports on Tableau. This section looks at the Datasource pane as well as the tab that will likely appear in the exam. There are several options in these tabs that can change the data connection type and even modify the data coming into the report for analysis, such as adjusting the data type to an appropriate one.

The Datasource Window

This window allows users to view, edit, add, remove, and combine data sources present in a workbook. The window is generally structured as follows:

  • Metadata grid
  • Results pane

The number of records will be displayed along the top of the Results pane. If data contains fewer than 1,000 records, all records will be displayed; otherwise, it is limited to a sample of 1,000 rows.

Some key features are worth identifying in this window, as listed after the figure:

Figure 1.23: Tableau Data Source page

Figure 1.23: Tableau Data Source page

Refresh button: Formula

Add connection: Formula

Relate/join tables: Formula

Union tables: Formula

Extend tables: Formula

Connection type: A white background with black text

Description automatically generated

Data source filters: A close up of a text

Description automatically generated

Results pane settings cog: Formula

The Data Tab

When at least one data connection is set within a workbook, the Data tab at the top of the screen contains multiple options for managing sources. (This tab is always accessible, regardless of which tab or sheet in the workbook the user is currently.)

Options in this tab are as follows:

Figure 1.24: Data tab options

Figure 1.24: Data tab options

Take some time to go over these options and explore what happens. It is good to familiarize yourself with what these options mean and how they work in a workbook setting.

Making New Connections in Existing Workbooks

Various data sources can be combined as required in a single Tableau workbook to suit specific analytic requirements. They can be used separately or combined together through methods such as joins, unions, relationships, and blends.

Joins use a key identifier or linking field to join two tables together, while unions stack data tables on top of each other.

Relationships are a new model and a flexible way to connect two or more tables together. Blends allow the user to establish a connection between a detailed table and an aggregated table.

These connections will be explained later in this book.

Replacing Data Sources

There might come an occasion when the original data source is outdated or has been improved and in a new location. If a specific visualization has been created with a data source, a user can easily replace the data source with a new one by adding the new data source.

Once this has been done, right-click on the old data source and select Replace Data Source.

There will be an option to select the current data and then the replacement data. Once this is done, the data should automatically update with the new data source. If the data is kept the same as before, maybe a new column is included, then the replacement should go smoothly and all charts will be able to take on the new fields.

However, it is important to note that if the field has changed or a specific field that was used is no longer in the data, this can potentially break the workbook.

Notes, Caveats, and Unsupported Data Sources

Uncommon connectors may require a few additional steps to establish a connection.

In some cases, a specific driver needs downloading that does not come pre-packaged with Tableau Desktop; for example, for a connection to SAP HANA (a database), a user will need to have a driver installed, as shown here:

Figure 1.25: SAP HANA connection error

Figure 1.25: SAP HANA connection error

Drivers convert information between each end of the connection so applications can speak to each other; they can be thought of as translators at each end of the connection.

For caveats on multidimensional sources such as cubes, please see the previous section/s regarding database connections.

Web-Based Data Access: Drivers and APIs

Servers such as the previous ones are made accessible through drivers; as discussed previously, some of these come pre-packaged with the Tableau software when it is downloaded, and others require downloading by the user. Tableau can also connect directly to data on the internet that is accessible by Application Programming Interfaces (APIs) rather than ODBC or JDBC drivers, where a user can access a database system using SQL; this is done using Web Data Connectors. The details of Web Data Connectors are not required for the exam.

Multidimensional Systems

There are also multidimensional systems that are structured and queried with different technologies to relational databases. The technical details of these data source types are not expected for the exam, but it is useful to have an awareness of them for comparison against relational sources.

Data in multidimensional systems is pre-aggregated with particular business questions in mind. For example, the developer of an OLAP cube (a common type of multidimensional system) may create a summary view of profits at a country level, rather than having a user query billions of records at the transaction level to calculate this higher-level value outside of the source. As the user can query these aggregations directly with algorithms optimized to do so, multidimensional databases are much quicker to query. However, they are far less flexible to use than relational sources in Tableau: useful features such as data source filters, level of detail calculations and forecasting are not available when using cube sources.

Only the following sources are supported in Tableau at the time of writing:

Figure 1.26: Supported sources available in Tableau

Figure 1.26: Supported sources available in Tableau

Summary

This chapter went over the different types of data connections and how a user can connect to databases to get their data. After this chapter, you should have confidence in connecting to data and should be able to move on to the next chapter to learn how to transform data for analysis. It is recommended that you take some time to practice connecting to the different data types and experiencing the connections.

Exam Readiness Drill – Chapter Review Questions

Apart from a solid understanding of key concepts, being able to think quickly under time pressure is a skill that will help you ace your certification exam. That is why working on these skills early on in your learning journey is key.

Chapter review questions are designed to improve your test-taking skills progressively with each chapter you learn and review your understanding of key concepts in the chapter at the same time. You’ll find these at the end of each chapter.

How to Access these Resources

To learn how to access these resources, head over to the chapter titled Chapter 9, Accessing the Online Practice Resources.

To open the Chapter Review Questions for this chapter, perform the following steps:

  1. Click the link – https://packt.link/TDA_CH01.

    Alternatively, you can scan the following QR code (Figure 1.27):

Figure 1.27: QR code that opens Chapter Review Questions for logged-in users

Figure 1.27: QR code that opens Chapter Review Questions for logged-in users

  1. Once you log in, you’ll see a page similar to the one shown in Figure 1.28:
Figure 1.28: Chapter Review Questions for Chapter 1

Figure 1.28: Chapter Review Questions for Chapter 1

  1. Once ready, start the following practice drills, re-attempting the quiz multiple times.

Exam Readiness Drill

For the first three attempts, don’t worry about the time limit.

ATTEMPT 1

The first time, aim for at least 40%. Look at the answers you got wrong and read the relevant sections in the chapter again to fix your learning gaps.

ATTEMPT 2

The second time, aim for at least 60%. Look at the answers you got wrong and read the relevant sections in the chapter again to fix any remaining learning gaps.

ATTEMPT 3

The third time, aim for at least 75%. Once you score 75% or more, you start working on your timing.

Tip

You may take more than three attempts to reach 75%. That’s okay. Just review the relevant sections in the chapter till you get there.

Working On Timing

Target: Your aim is to keep the score the same while trying to answer these questions as quickly as possible. Here’s an example of how your next attempts should look like:

Attempt

Score

Time Taken

Attempt 5

77%

21 mins 30 seconds

Attempt 6

78%

18 mins 34 seconds

Attempt 7

76%

14 mins 44 seconds

Table 1.2: Sample timing practice drills on the online platform

Note

The time limits shown in the above table are just examples. Set your own time limits with each attempt based on the time limit of the quiz on the website.

With each new attempt, your score should stay above 75% while your “time taken” to complete should “decrease”. Repeat as many attempts as you want till you feel confident dealing with the time pressure.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Benefit from a team of expert authors guiding you through complex topics with clarity and precision
  • Comprehensive coverage of the syllabus, ensuring thorough preparation for all exam objectives
  • Additional exam resources to reinforce your understanding of Tableau concepts and boost exam readiness

Description

The Tableau Certified Data Analyst certification validates the essential skills needed to explore, analyze, and present data, propelling your career in data analytics. Whether you're a seasoned Tableau user or just starting out, this comprehensive resource is your roadmap to mastering Tableau and achieving certification success. The book begins by exploring the fundamentals of data analysis, from connecting to various data sources to transforming and cleaning data for meaningful insights. With practical exercises and realistic mock exams, you'll gain hands-on experience that reinforces your understanding of Tableau concepts and prepares you for the challenges of the certification exam. As you progress, expert guidance and clear explanations make it easy to navigate complex topics as each chapter builds upon the last, providing a seamless learning experience—from creating impactful visualizations to managing content on Tableau Cloud. Written by a team of experts, this Tableau book not only helps you pass the certification exam but also equips you with the skills and confidence needed to excel in your career. It is an indispensable resource for unlocking the full potential of Tableau.

What you will learn

Connect to various data sources, essential for data analysis journey Master data transformation with Tableau Desktop and Tableau Prep Builder Explore the full range of calculation types, from basic to advanced Structure and filter data, including sets, bins, and hierarchies Add analytical functionality to visualize data insights Merge charts to create interactive dashboards and compelling data stories Customize visuals at workbook, chart, and dashboard levels Publish and manage content on Tableau Cloud, sharing insights with ease

Product Details

Country selected

Publication date : Jun 25, 2024
Length 462 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781803243467
Category :
Languages :
Concepts :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Jun 25, 2024
Length 462 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781803243467
Category :
Languages :
Concepts :
Tools :

Table of Contents

11 Chapters
Preface Chevron down icon Chevron up icon
1. Chapter 1: Connecting to Data Chevron down icon Chevron up icon
2. Chapter 2: Transforming Data Chevron down icon Chevron up icon
3. Chapter 3: Calculations Chevron down icon Chevron up icon
4. Chapter 4: Grouping and Filtering Chevron down icon Chevron up icon
5. Chapter 5: Charts Chevron down icon Chevron up icon
6. Chapter 6: Dashboards Chevron down icon Chevron up icon
7. Chapter 7: Formatting Chevron down icon Chevron up icon
8. Chapter 8: Publishing and Managing Content Chevron down icon Chevron up icon
9. Chapter 9: Accessing the Online Practice Resources Chevron down icon Chevron up icon
10. Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Top Reviews
No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.