Further reading
To learn more about the topics that were covered in this chapter, take a look at the following resources:
- Wikipedia, Hyperparameter (machine learning) (https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)).
- Matt Asay, 2017, 85% of big data projects fail, TechRepublic, November (https://www.techrepublic.com/article/85-of-big-data-projects-fail-but-your-developers-can-help-yours-succeed/).
- Rackspace Technologies, New Global Rackspace Technology Study Uncovers Widespread Artificial Intelligence and Machine Learning Knowledge Gap, January 2021 (https://www.rackspace.com/newsroom/new-global-rackspace-technology-study-uncovers-widespread-artificial-intelligence-and).
- Gartner, Gartner Data Shows 87 Percent of Organizations Have Low BI and Analytics Maturity, December 2018 (https://www.gartner.com/en/newsroom/press-releases/2018-12-06-gartner-data-shows-87-percent-of-organizations-have-low-bi-and-analytics-maturity).
- Learning Spark: Lightning-Fast Data Analytics, by Holden Karau, Andy Konwinski, Patrick Wendell, and Matei Zaharia: This comprehensive guide covers the fundamentals of Spark, including RDDs, the DataFrame API, Spark Streaming, MLlib, and GraphX. With practical examples and use cases, it will help you become proficient in using Spark for data analytics.
- Spark: The Definitive Guide, by Bill Chambers and Matei Zaharia: This acclaimed book provides a deep dive into Spark’s core concepts and advanced features. It covers Spark’s architecture, data processing techniques, ML, graph processing, and deployment considerations. Suitable for beginners and experienced users, it offers a comprehensive understanding of Spark.
- High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark, by Holden Karau, Rachel Warren, and Matei Zaharia: This book explores strategies for optimizing Spark applications to achieve maximum performance and scalability. It offers insights into tuning Spark configurations, improving data locality, leveraging advanced features, and designing efficient data pipelines.
- Spark in Action, by Jean-Georges Perrin: This practical guide takes you through the entire Spark ecosystem, covering data ingestion, transformation, ML, real-time processing, and integration with other technologies. With hands-on examples and real-world use cases, it enables you to apply Spark to your specific projects.
- Get Started using Unity Catalog (https://docs.databricks.com/data-governance/unity-catalog/get-started.html)
- Databricks documentation (https://docs.databricks.com/introduction/index.html).