You're reading from Scalable Data Architecture with Java Build efficient enterprise-grade data architecting solutions using Java

Product type Paperback

Published in Sep 2022

Publisher Packt

ISBN-13 9781801073080

Length 382 pages

Edition 1st Edition

Languages

Java

Tools

Deeplearning4j

Concepts

Data Science

Author (1):

Sinchan Banerjee

View More author details

Table of Contents (19) Chapters

Preface

1. Section 1 – Foundation of Data Systems

2. Chapter 1: Basics of Modern Data Architecture FREE CHAPTER

3. Chapter 2: Data Storage and Databases

4. Chapter 3: Identifying the Right Data Platform

5. Section 2 – Building Data Processing Pipelines

6. Chapter 4: ETL Data Load – A Batch-Based Solution to Ingesting Data in a Data Warehouse

7. Chapter 5: Architecting a Batch Processing Pipeline

8. Chapter 6: Architecting a Real-Time Processing Pipeline

9. Chapter 7: Core Architectural Design Patterns

10. Chapter 8: Enabling Data Security and Governance

11. Section 3 – Enabling Data as a Service

12. Chapter 9: Exposing MongoDB Data as a Service

13. Chapter 10: Federated and Scalable DaaS with GraphQL

14. Section 4 – Choosing Suitable Data Architecture

15. Chapter 11: Measuring Performance and Benchmarking Your Applications

16. Chapter 12: Evaluating, Recommending, and Presenting Your Solutions

17. Index

Why subscribe?

18. Other Books You May Enjoy

Architecting a Batch Processing Pipeline

In the previous chapter, we learned how to architect medium- to low-volume batch-based solutions using Spring Batch. We also learned how to profile such data using DataCleaner. However, with data growth becoming exponential, most companies have to deal with huge volumes of data and analyze it to their advantage.

In this chapter, we will discuss how to analyze, profile, and architect a big data solution for a batch-based pipeline. Here, we will learn how to choose the technology stack and design a data pipeline to create an optimized and cost-efficient big data solution. We will also learn how to implement this solution using Java, Spark, and various AWS components and test our solution. After that, we will discuss how to optimize the solution to be more time and cost-efficient. By the end of this chapter, you will know how to architect and implement a data analysis pipeline in AWS using S3, Apache Spark (Java), AWS EMR, AWS Lambda, and AWS...

The rest of the chapter is locked

You're reading from Scalable Data Architecture with Java Build efficient enterprise-grade data architecting solutions using Java

Table of Contents (19) Chapters

Architecting a Batch Processing Pipeline

Authors (1)

Personalised recommendations for you

You're reading from Scalable Data Architecture with Java Build efficient enterprise-grade data architecting solutions using Java

Table of Contents (19) Chapters

Architecting a Batch Processing Pipeline

Unlock this book and the full library FREE for 7 days

Authors (1)

Personalised recommendations for you