Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Tableau Prep Cookbook

You're reading from   Tableau Prep Cookbook Use Tableau Prep to clean, combine, and transform your data for analysis

Arrow left icon
Product type Paperback
Published in Mar 2021
Publisher Packt
ISBN-13 9781800563766
Length 288 pages
Edition 1st Edition
Tools
Arrow right icon
Author (1):
Arrow left icon
Hendrik Kleine Hendrik Kleine
Author Profile Icon Hendrik Kleine
Hendrik Kleine
Arrow right icon
View More author details
Toc

Table of Contents (11) Chapters Close

Preface 1. Chapter 1: Getting Started with Tableau Prep 2. Chapter 2: Extract and Load Processes FREE CHAPTER 3. Chapter 3: Cleaning Transformations 4. Chapter 4: Data Aggregation 5. Chapter 5: Combining Data 6. Chapter 6: Pivoting Data 7. Chapter 7: Creating Powerful Calculations 8. Chapter 8: Data Science in Tableau Prep Builder 9. Chapter 9: Creating Prep Flows in Various Business Scenarios 10. Other Books You May Enjoy

Connecting to PDF files

In this recipe, we'll connect to a PDF file containing text and a table with data. Tableau Prep has an exciting feature that can automatically detect the presence of tables in PDF files and extract the data for you.

Getting ready

To follow along with the recipe, download the Sample Files 2.2 folder from the book's GitHub repository.

How to do it…

To get started, ensure you have the sample PDF file ready on your computer, and open Tableau Prep Builder:

  1. Tableau Prep Builder will not show us the entire PDF document, so it's best to open it in a PDF viewer and review what data we want to extract from our PDF. In our example document here, we have a single table and so we expect a table in Tableau Prep with the headers Department and Amount:
    Figure 2.14 – Sample PDF file with a table embedded in it

    Figure 2.14 – Sample PDF file with a table embedded in it

  2. In Tableau Prep Builder, select the Connect to Data button, followed by PDF file to open the file browse dialog and select our sample PDF file, Sales Summary.pdf:
    Figure 2.15 – Select PDF file from the Connect pane

    Figure 2.15 – Select PDF file from the Connect pane

  3. Once connected, Tableau Prep Builder will automatically detect the tables within the PDF file. In our sample, we can see the Department and Amount fields coming through as expected:

    Figure 2.16 – PDF tables are automatically extracted

    Figure 2.16 – PDF tables are automatically extracted

  4. Each table is listed separately in the Tables part of the Connections pane to the left. This allows you to digest PDF files with multiple tables within them just as easily. The name of the table is automatically generated and refers to the page number in the PDF file and its position on the page:
Figure 2.17 – Tableau Prep can detect multiple tables in a single PDF file

Figure 2.17 – Tableau Prep can detect multiple tables in a single PDF file

In this recipe, you have learned how to connect to PDF files and extract data for processing in Tableau Prep.

How it works…

Tableau Prep converts each table in a PDF document into a data table when ingesting the file into a new flow. As such, Tableau Prep removes the complexity of parsing PDF documents and allows you to treat this like any other data connection.

You have been reading a chapter from
Tableau Prep Cookbook
Published in: Mar 2021
Publisher: Packt
ISBN-13: 9781800563766
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime