Technical requirements
In this chapter, we will introduce some elementary pipelines written using Beam's Java Software Development Kit (SDK).
We will use the code located in the GitHub repository for this book: https://github.com/PacktPublishing/Building-Big-Data-Pipelines-with-Apache-Beam.
We will also need the following tools to be installed:
- Java Development Kit (JDK) 11 (possibly OpenJDK 11), with
JAVA_HOME
set appropriately - Git
- Bash
Important note
Although it is possible to run many tools in this book using the Windows shell, we will focus on using Bash scripting only. We hope Windows users will be able to run Bash using virtualization or Windows Subsystem for Linux (or any similar technology).
First of all, we need to clone the repository:
- To do this, we create a suitable directory, and then we run the following command:
$ git clone https://github.com/PacktPublishing/Building-Big-Data-Pipelines-with-Apache-Beam.git
- This will result in a directory,
Building-Big-Data-Pipelines-with-Apache-Beam
, being created in the working directory. We then run the following command in this newly created directory:$ ./mvnw clean install
Throughout this book, the $
character will denote a Bash shell. Therefore, $ ./mvnw clean install
would mean to run the ./mvnw
command in the top-level directory of the git clone
(that is, Building-Big-Data-Pipelines-with-Apache-Beam
). By using chapter1$ ../mvnw clean install
, we mean to run the specified command in the subdirectory called chapter1
.