To get the most out of this book
Before diving into the chapters, it’s essential to have a basic understanding of Python programming and familiarity with fundamental data processing concepts. Additionally, a grasp of distributed computing principles and experience with data manipulation and analysis will be beneficial. Throughout the book, we’ll assume a working knowledge of Python and foundational concepts in data engineering and analytics. With these prerequisites in place, you’ll be well-equipped to embark on your journey to becoming a certified Apache Spark developer.
Software/hardware covered in the book |
Operating system requirements |
Python |
Windows, macOS, or Linux |
Spark |
The code will work best if you sign up for the community edition of Databricks and import the python files into your account.
If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.