Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Data Engineering with Alteryx

You're reading from   Data Engineering with Alteryx Helping data engineers apply DataOps practices with Alteryx

Arrow left icon
Product type Paperback
Published in Jun 2022
Publisher Packt
ISBN-13 9781803236483
Length 366 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Paul Houghton Paul Houghton
Author Profile Icon Paul Houghton
Paul Houghton
Arrow right icon
View More author details
Toc

Table of Contents (18) Chapters Close

Preface 1. Part 1: Introduction
2. Chapter 1: Getting Started with Alteryx FREE CHAPTER 3. Chapter 2: Data Engineering with Alteryx 4. Chapter 3: DataOps and Its Benefits 5. Part 2: Functional Steps in DataOps
6. Chapter 4: Sourcing the Data 7. Chapter 5: Data Processing and Transformations 8. Chapter 6: Destination Management 9. Chapter 7: Extracting Value 10. Chapter 8: Beginning Advanced Analytics 11. Part 3: Governance of DataOps
12. Chapter 9: Testing Workflows and Outputs 13. Chapter 10: Monitoring DataOps and Managing Changes 14. Chapter 11: Securing and Managing Access 15. Chapter 12: Making Data Easy to Use and Discoverable with Alteryx 16. Chapter 13: Conclusion 17. Other Books You May Enjoy

Leveraging Alteryx Server and Alteryx Connect

Once you have successfully created a data pipeline, the following process is to automate its use. In this section, we will use Alteryx to automate a pipeline and create discoverability and trust in the data.

The two products we will focus on are Alteryx Server and Alteryx Connect. Server is the workflow automation, scaling, and sharing platform, while Connect is for data cataloging, trust, and discoverability.

Server has three main capabilities that are of benefit to a data engineer:

  • Time-based automation of workflows: Relying on a single person to run a workflow that is key to any system is a recipe for failure. So, having a schedule-based system for running those workflows makes it more robust and reliable.
  • Scaling of capacity for running workflows: Running multiple workflows on Designer Desktop is not a good experience for most people. Having Server run more workflows will also free up local resources for other jobs.
  • Sharing workflows via a central location: The Server is the central location where workflows are published to and discovered by users around the organization.

Connect is a service for data cataloging and discovery. Data assets can be labeled by what the data represents, the field contents, or the source. This catalog enables the discovery of new resources. Additionally, the Data Nexus allows a data field's lineage to be traced and builds trust with users to know where a field originated from and what transformations have taken place.

How can you use Alteryx Server to orchestrate a data pipeline?

Once we have created a pipeline, we may want to have the dataset extracted on a regular schedule. Having this process automated allows for more robust implementation and makes using the dataset simpler to use.

Orchestrating a data pipeline with Alteryx Server is a three-step process:

  1. Create a pipeline in Alteryx Designer and publish it to Alteryx Server.
  2. Set a time frame to run the workflow.
  3. Monitor the running of the workflow.

This three-step process is deceptively simple and, for this introduction, only covers the most straightforward use cases. Later, in Chapter 10, Monitoring DataOps and Managing Changes, we will walk through some techniques to orchestrate more complex, multistep data pipelines. Still, those examples fundamentally come back to these three steps mentioned above.

In the following screenshot, we can see how we can define the time frame for our schedule on the Server Schedule page:

Figure 1.6 – The Alteryx Server scheduling page

Figure 1.6 – The Alteryx Server scheduling page

On this page, we can define the frequency of a schedule, the time the schedule will occur, and provide a reference name for the schedule.

How does Connect help with discoverability?

The final piece of your data engineering puzzle is how will users find and trust the dataset you have created? While you will often generate datasets on request, you also find that users will come to you looking for datasets you have already made, and they don't know they exist.

Connect is a data cataloging and discoverability tool for you to surface the datasets in your organization and allow users to find them, request access, and understand what the fields are. It is a central place for data definitions and allows searching in terms of how content is defined.

You have been reading a chapter from
Data Engineering with Alteryx
Published in: Jun 2022
Publisher: Packt
ISBN-13: 9781803236483
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image