To get the most out of this book
Basic knowledge of computer systems and concepts, and how these are used within large organizations, is helpful prerequisite knowledge for this book. However, no data engineering-specific skills or knowledge are required. Also, a familiarity with cloud computing fundamentals and core AWS systems will make it easier to follow along, especially with the hands-on exercises, but detailed step-by-step instructions are included for each task.
Note:
If you are using the digital version of this book, we advise you to access the code from the book’s GitHub repository (a link is available in the next section), rather than copying and pasting from the PDF or electronic version. Doing so will help you avoid any potential formatting errors when copying and pasting code.
Download the example code files
The code bundle for the book is hosted on GitHub at https://github.com/PacktPublishing/Data-Engineering-with-AWS-2nd-edition. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Download the color images
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://packt.link/gbp/9781804614426.
Conventions used
There are a number of text conventions used throughout this book.
CodeInText
: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. For example: “Include a WHERE Year = 2020
clause.”
A block of code is set as follows:
datalake_bucket/year=2023/file1.parquet
datalake_bucket/year=2022/file1.parquet
datalake_bucket/year=2021/file1.parquet
datalake_bucket/year=2020/file1.parquet
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
datalake_bucket/year=2023/file1.parquet
datalake_bucket/year=2022/file1.parquet
datalake_bucket/year=2021/file1.parquet
datalake_bucket/year=2020/file1.parquet
Bold: Indicates a new term, an important word, or words that you see on the screen. For instance, words in menus or dialog boxes appear in the text like this. For example: “In addition, you can use Spark SQL to process data using standard SQL.”
Warnings or important notes appear like this.
Tips and tricks appear like this.