You're reading from Data Engineering Best Practices Architect robust and cost-effective data solutions in the cloud era

Product type Paperback

Published in Oct 2024

Publisher Packt

ISBN-13 9781803244983

Length 550 pages

Edition 1st Edition

Languages

SQL

Tools

Google cloud SQL

Concepts

Data Engineering

Authors (2):

David Larochelle

Richard J. Schiller

View More author details

Table of Contents (21) Chapters

Preface

1. Chapter 1: Overview of the Business Problem Statement FREE CHAPTER

2. Chapter 2: A Data Engineer’s Journey – Background Challenges

3. Chapter 3: A Data Engineer’s Journey – IT’s Vision and Mission

4. Chapter 4: Architecture Principles

5. Chapter 5: Architecture Framework – Conceptual Architecture Best Practices

6. Chapter 6: Architecture Framework – Logical Architecture Best Practices

7. Chapter 7: Architecture Framework – Physical Architecture Best Practices

8. Chapter 8: Software Engineering Best Practice Considerations

9. Chapter 9: Key Considerations for Agile SDLC Best Practices

10. Chapter 10: Key Considerations for Quality Testing Best Practices

11. Chapter 11: Key Considerations for IT Operational Service Best Practices

12. Chapter 12: Key Considerations for Data Service Best Practices

13. Chapter 13: Key Considerations for Management Best Practices

14. Chapter 14: Key Considerations for Data Delivery Best Practices

15. Chapter 15: Other Considerations – Measures, Calculations, Restatements, and Data Science Best Practices

16. Chapter 16: Machine Learning Pipeline Best Practices and Processes

17. Chapter 17: Takeaway Summary – Putting It All Together

18. Chapter 18: Appendix and Use Cases

19. Index

Why subscribe?

20. Other Books You May Enjoy

Overview of the Business Problem Statement

We begin with the task of defining the business problem statement.

“Businesses are faced with an ever-changing technological landscape. Competition requires one to innovate at scale to remain relevant; this causes a constant implementation stream of total cost of ownership (TCO) budget allocations for refactoring and re-envisioning during what would normally be a run/manage phase of a system’s lifespan.”

This rapid rate of change means the goalposts are constantly moving. “Are we there yet?” is a question I heard from my kids constantly when traveling. It came from not knowing where we were or having any idea of the effort to get to where we were going, with a driver (me) who had never driven to that destination before. Thank goodness for Garmin (automobile navigation systems) and Google Maps, and not the outdated paper maps that were used in the past. See how technology even impacted that metaphor? Garmin is being displaced by Google for mapping use cases. This is not always because it is better but because it is free (if you wish to be subjected to data collection and advertising interruptions) and it is hosted on everyone’s smart device.

Now, I can tell my grandkids that in exactly 1 hour and 29 minutes, they will walk into their home after spending the weekend with their grandparents. The blank stare I get in response tells it all. Mapped data, rendered with real-time technology, has changed us completely.

Technological change can appear revolutionary when it’s occurring, but when looking back over time, the progression of change appears to be a no-brainer series of events that we take for granted, and even evolutionary. That is what is happening today with data, information, knowledge, and analytical data stores in the cloud. The term DataOps was popularized by Andy Palmer, co-founder and CEO of Tamr {https://packt-debp.link/MGj4EU}. The data management and analytics world has referenced the term often. In 2015, Palmer stated that DataOps is not just a buzzword, but a critical approach to managing data in today’s complex, data-driven world.

I believe that it’s time for data engineers and data scientists to embrace a similar (to DevOps) new discipline – let’s call it DataOps – that at its core addresses the needs of data professionals on the modern internet and inside the modern enterprise. (Andy Palmer {https://packt-debp.link/ihlztK})

In Figure 1.1, observe how data quality, integration, engineering, and security are tied together with a solid DataOps practice:

Figure 1.1 – DataOps in the enterprise

The goal of this chapter is to set up the foundation for understanding why the best practices presented are structured as they are in this book. This foundation will provide a firm footing to make the framework you adopt in your everyday engineering tasks more secure and well-grounded. There are many ways to look at solutions to data engineering challenges, and each vendor, engineering school, and cloud provider will have its own spin on the formula for success. That success will ultimately depend on what you can get working today and keep working in the future. A unique balance of various forces will need to be obtained. However, this balance may be easily upset if the foundation is not correct. As a reader, you will have naturally formed biases toward certain engineering challenges. These can force you into niche (or single-minded) focus directions – for example, a fixation on robust/highly available multi-region operations with a de-emphasized pipeline software development effort. As a result, you may overbuild robustness and underdevelop key features. Likewise, you can focus on hyper-agile streaming of development changes into production at the cost of consumer data quality. More generally, there is a significant risk from just doing IT and losing focus on why we need to carefully structure the processing of data in a modern information processing system. You must not neglect the need to capture data with its semantic context, thus making it true and relevant, instead of the software system becoming the sole interpretation of the data. This freedom makes data and context equal to information that is fit for purpose, now and in the future.

We can begin with the business problem statement.

You're reading from Data Engineering Best Practices Architect robust and cost-effective data solutions in the cloud era

Table of Contents (21) Chapters

Overview of the Business Problem Statement

Authors (2)

Personalised recommendations for you