Preface
Are you an IT professional, IT manager, or business leader looking for an effective large-scale data engineering solution platform? Have you experienced the pain of slogging through piles of literature? Have you had to implement a series of painful proofs of concept? If so, this book is for you.
You will emerge on the other side able to implement correctly architected, data-engineered solutions that address real problems you will face in the development process.
Data engineering is rapidly evolving, and the modern data engineer needs to be equipped with software engineering practices to succeed in today’s fast-paced data-driven world. This hands-on book takes a practical approach to applying software and data engineering practices to modern use cases, including the following:
- Migrating to cloud-based storage and processing
- Applying Agile methodologies
- Prioritizing governance, privacy, and security
This book is ideal for data engineers and analytics teams looking to enhance their skills and gain a competitive edge in the industry. While reading the book, you will be prompted with ideas, questions, and plans for implementation that would not have been considered prior to reading.
This book assumes that you have a foundational knowledge of at least one cloud vendor service, in particular, Amazon Web Services (AWS) or Microsoft’s Azure. Additionally, you should be well versed in a scripting language (such as Python) and a primary language (such as Java or C/C++), have encountered concurrent/distributed big data processing, and ideally have some experience with analytic services such as Azure Analysis Services (AAS), Microsoft Power BI, or other third-party analytic solutions. This book is largely aimed at developers and architects who understand Python and cloud computing but want a complete framework for future-proofing successful solutions.
The book is not proscriptive regarding IT solutions, but it does raise key considerations for evaluation as the technology field evolves. After reading this book, IT architects will be equipped to dialogue with cloud vendors and third-party vendors following best practices, so that any solution developed remains robust, of high quality, and cost-effective over time.
This book’s structure is as follows:
- Mission/vision
- Principles
- Architecture
- Best practices
- Design patterns
- Use cases
Where pertinent, vendor selection criteria are presented wherein business value statements affect weighting, so that decisions are correctly made to implement an organization’s goals. Real-life examples and lessons sum up key points. The book is structured to enable you to envision a reference architecture for your organization and then see the implementation of the business solution in the context of the reference architecture. As the content of the chapters is absorbed, it is a best practice to organize the solution forming in your mind. This is our first key consideration:
“Envision what it means to my company’s goals.”
Organize your notes and takeaways from the perspective of “What does it mean for my goals?” while building up a reference architecture and solution strawman.
By the end of this book, you will be able to architect, design, and implement end-to-end cloud-based data processing pipelines. You will also be able to provide customers with access to data as a product supporting various machine learning, analytic, and big data use cases… all within a well-architected data framework. You will know how to build or buy logical components aligned to the architected data framework’s principles and best practices using Agile software development processes tuned to work for an organization. Although this book will not supply all the answers, it will shine a light on the path to success while avoiding the pitfalls encountered by many, including the author’s own experiences. It will save you countless hours of frustration and enable more rapid creation of better-architected systems.