You're reading from Simplifying Data Engineering and Analytics with Delta Create analytics-ready data that fuels artificial intelligence and business intelligence

Product type Paperback

Published in Jul 2022

Publisher Packt

ISBN-13 9781801814867

Length 334 pages

Edition 1st Edition

Languages

Python

Tools

Apache Spark

Concepts

Artificial Intelligence

Author (1):

Anindita Mahapatra

View More author details

Table of Contents (18) Chapters

Preface

1. Section 1 – Introduction to Delta Lake and Data Engineering Principles

2. Chapter 1: Introduction to Data Engineering FREE CHAPTER

3. Chapter 2: Data Modeling and ETL

4. Chapter 3: Delta – The Foundation Block for Big Data

5. Section 2 – End-to-End Process of Building Delta Pipelines

6. Chapter 4: Unifying Batch and Streaming with Delta

7. Chapter 5: Data Consolidation in Delta Lake

8. Chapter 6: Solving Common Data Pattern Scenarios with Delta

9. Chapter 7: Delta for Data Warehouse Use Cases

10. Chapter 8: Handling Atypical Data Scenarios with Delta

11. Chapter 9: Delta for Reproducible Machine Learning Pipelines

12. Chapter 10: Delta for Data Products and Services

13. Section 3 – Operationalizing and Productionalizing Delta Pipelines

14. Chapter 11: Operationalizing Data and ML Pipelines

15. Chapter 12: Optimizing Cost and Performance with Delta

16. Chapter 13: Managing Your Data Journey

17. Other Books You May Enjoy

Compensating for missing and out-of-range data

There will be cases where some columns may have missing data. The business use case will determine how serious it is and what to do about it. If a field is being used as an input to a model, it needs a data point. Here are some strategies regarding what you can do:

Drop the affected records. This is OK when you do not need to use the information for downstream workloads.
Flag the row/column by adding a marker value (for example, -1). This allows you to see missing data later on without violating a schema:

Perform basic imputing so that you have a "best guess" regarding what the data could have been, often by using the mean of non-missing data:
- The following is an example of filling default values for specific columns:

The following is an example of using the "average strategy" to impute the values of the specified columns:

...

The rest of the chapter is locked

You're reading from Simplifying Data Engineering and Analytics with Delta Create analytics-ready data that fuels artificial intelligence and business intelligence

Table of Contents (18) Chapters

Compensating for missing and out-of-range data

Authors (1)

Personalised recommendations for you

You're reading from Simplifying Data Engineering and Analytics with Delta Create analytics-ready data that fuels artificial intelligence and business intelligence

Table of Contents (18) Chapters

Compensating for missing and out-of-range data

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you