To get the most out of this book
Before diving into the hands-on activities and code examples provided in this book, it’s important to be aware of the software and knowledge prerequisites. The following is a summary table outlining what you’ll need:
Prerequisite |
Description |
Databricks Runtime |
This book is tailored for Databricks Runtime 13.3 LTS for Machine Learning or above. |
Python proficiency (3.x) |
You should be proficient in at least Python 3.x, as the code samples are primarily written in this version. |
Statistics and ML basics |
A strong understanding of statistics and machine learning lifecycles is assumed. |
Spark knowledge (3.0 or above) |
An introductory level of familiarity with Apache Spark 3.0 or above is required, as Databricks is built on Spark. |
Delta Lake features (optional) |
Introductory knowledge of Delta Lake features could enhance your understanding but is not mandatory. |
To fully utilize all the features and code examples described in this book, you’ll need a Databricks trial account, which lasts for 14 days. We recommend planning your learning journey to complete the hands-on activities within this timeframe. If you find the platform valuable and wish to continue using it beyond the trial period, consider reaching out to your Databricks contact to set up a paid workspace.
If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
After completing this book, we highly recommend you explore the latest features in both private and public previews within the Databricks documentation. This will provide you with insights into the future trajectory of machine learning on Databricks, allowing you to remain ahead of the curve and make the most of emerging functionalities.