Part 3: Connecting It All Together
In this part, you will learn how to deploy and orchestrate the big data tools and technologies covered in the previous chapters on Kubernetes. You will build scripts to deploy Apache Spark, Apache Airflow, and Apache Kafka on a Kubernetes cluster, making them ready for running data processing jobs, orchestrating data pipelines, and handling real-time data ingestion, respectively. Additionally, you will explore data consumption layers, data lake engines such as Trino, and real-time data visualization with Elasticsearch and Kibana, all deployed on Kubernetes. Finally, you will bring everything together by building and deploying two complete data pipelines, one for batch processing and another for real-time processing, on a Kubernetes cluster. The part also covers the deployment of generative AI applications on Kubernetes and provides guidance on where to go next in your Kubernetes and big data journey.
This part contains the following chapters...