Implementing our first pipelines in the Python SDK
In this section, we will learn the basics of the Python SDK. Namely, we will learn how to create the pipeline, how to run it using DirectRunner
, and how to test our pipelines. Again, our very first pipeline will take our well-known input file, called lorem.txt
, which we used in Chapter 1, Introducing Data Processing with Apache Beam, and output the number of occurrences of each word present in the file. So, let's dive into the Python SDK.
Implementing our first Python pipeline
The source code can be found in the first_pipeline.py
script, which is located in chapter6/src/main/python/
. Let's get started:
- The script uses the name of its file as an argument, so we can run it locally using the following command:
$ chapter6/src/main/python/first_pipeline.py \ chapter1/src/main/resources/lorem.txt
- After running the preceding command, we will see the usual output, which consists of a word...