You're reading from Data Engineering with Alteryx Helping data engineers apply DataOps practices with Alteryx

Product type Paperback

Published in Jun 2022

Publisher Packt

ISBN-13 9781803236483

Length 366 pages

Edition 1st Edition

Languages

C++

Tools

Alteryx

Concepts

Data Engineering

Author (1):

Paul Houghton

View More author details

Table of Contents (18) Chapters

Preface

1. Part 1: Introduction

2. Chapter 1: Getting Started with Alteryx FREE CHAPTER

3. Chapter 2: Data Engineering with Alteryx

4. Chapter 3: DataOps and Its Benefits

5. Part 2: Functional Steps in DataOps

6. Chapter 4: Sourcing the Data

7. Chapter 5: Data Processing and Transformations

8. Chapter 6: Destination Management

9. Chapter 7: Extracting Value

10. Chapter 8: Beginning Advanced Analytics

11. Part 3: Governance of DataOps

12. Chapter 9: Testing Workflows and Outputs

13. Chapter 10: Monitoring DataOps and Managing Changes

14. Chapter 11: Securing and Managing Access

15. Chapter 12: Making Data Easy to Use and Discoverable with Alteryx

16. Chapter 13: Conclusion

17. Other Books You May Enjoy

Leveraging Alteryx Server and Alteryx Connect

Once you have successfully created a data pipeline, the following process is to automate its use. In this section, we will use Alteryx to automate a pipeline and create discoverability and trust in the data.

The two products we will focus on are Alteryx Server and Alteryx Connect. Server is the workflow automation, scaling, and sharing platform, while Connect is for data cataloging, trust, and discoverability.

Server has three main capabilities that are of benefit to a data engineer:

Time-based automation of workflows: Relying on a single person to run a workflow that is key to any system is a recipe for failure. So, having a schedule-based system for running those workflows makes it more robust and reliable.
Scaling of capacity for running workflows: Running multiple workflows on Designer Desktop is not a good experience for most people. Having Server run more workflows will also free up local resources for other jobs.
Sharing workflows via a central location: The Server is the central location where workflows are published to and discovered by users around the organization.

Connect is a service for data cataloging and discovery. Data assets can be labeled by what the data represents, the field contents, or the source. This catalog enables the discovery of new resources. Additionally, the Data Nexus allows a data field's lineage to be traced and builds trust with users to know where a field originated from and what transformations have taken place.

How can you use Alteryx Server to orchestrate a data pipeline?

Once we have created a pipeline, we may want to have the dataset extracted on a regular schedule. Having this process automated allows for more robust implementation and makes using the dataset simpler to use.

Orchestrating a data pipeline with Alteryx Server is a three-step process:

Create a pipeline in Alteryx Designer and publish it to Alteryx Server.
Set a time frame to run the workflow.
Monitor the running of the workflow.

This three-step process is deceptively simple and, for this introduction, only covers the most straightforward use cases. Later, in Chapter 10, Monitoring DataOps and Managing Changes, we will walk through some techniques to orchestrate more complex, multistep data pipelines. Still, those examples fundamentally come back to these three steps mentioned above.

In the following screenshot, we can see how we can define the time frame for our schedule on the Server Schedule page:

Figure 1.6 – The Alteryx Server scheduling page

On this page, we can define the frequency of a schedule, the time the schedule will occur, and provide a reference name for the schedule.

How does Connect help with discoverability?

The final piece of your data engineering puzzle is how will users find and trust the dataset you have created? While you will often generate datasets on request, you also find that users will come to you looking for datasets you have already made, and they don't know they exist.

Connect is a data cataloging and discoverability tool for you to surface the datasets in your organization and allow users to find them, request access, and understand what the fields are. It is a central place for data definitions and allows searching in terms of how content is defined.

You're reading from Data Engineering with Alteryx Helping data engineers apply DataOps practices with Alteryx

Table of Contents (18) Chapters

Leveraging Alteryx Server and Alteryx Connect

How can you use Alteryx Server to orchestrate a data pipeline?

How does Connect help with discoverability?

Authors (1)

Personalised recommendations for you