Preface
Every company today is a data company regardless of the industry. Innovative companies use data to analyze the past, predict what will happen, and react to what is happening now. Data engineers are some of the most critical employees at companies today. They are essential for collecting, cleaning, and maintaining trusted datasets that analysts, data scientists, and reporting tools use to provide insights.
This book will teach you to leverage the Scala programming language on the Spark framework and the latest cloud technologies to build continuous and triggered data pipelines. You will do this by setting up a data engineering environment for local development and scalable distributed cloud deployments, using data engineering best practices, test-driven development, and Continuous Integration/Continuous Delivery (CI/CD). You will also orchestrate and performance-tune your end-to-end pipelines to deliver data to your end users.