Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Big Data on Kubernetes
Big Data on Kubernetes

Big Data on Kubernetes : A practical guide to building efficient and scalable data solutions

Arrow left icon
Profile Icon Neylson Crepalde
Arrow right icon
$9.99 $31.99
eBook Jul 2024 296 pages 1st Edition
eBook
$9.99 $31.99
Paperback
$39.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Neylson Crepalde
Arrow right icon
$9.99 $31.99
eBook Jul 2024 296 pages 1st Edition
eBook
$9.99 $31.99
Paperback
$39.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$9.99 $31.99
Paperback
$39.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Big Data on Kubernetes

Getting Started with Containers

The world is rapidly generating massive amounts of data from a variety of sources – mobile devices, social media, e-commerce transactions, sensors, and more. This data explosion is often referred to as “big data.” While big data presents immense opportunities for businesses and organizations to gain valuable insights, it also brings tremendous complexity in how to store, process, analyze, and extract value from huge volumes of diverse data.

This is where Kubernetes comes in. Kubernetes is an open source container orchestration system that helps automate the deployment, scaling, and management of containerized applications. Kubernetes brings important advantages for building big data systems. It provides a standard way to deploy containerized big data applications on any infrastructure. This makes it easy to migrate applications across on-premises servers or cloud providers. It also makes it simple to scale big data applications...

Technical requirements

For this chapter, you should have Docker installed. Also, a computer with a minimum of 4 GB of RAM (8 GB is recommended) is required, as Docker can really consume a computer’s memory.

The code for this chapter is available on GitHub. Please refer to https://github.com/PacktPublishing/Bigdata-on-Kubernetes and access the Chapter01 folder.

Container architecture

Containers are an operating system-level virtualization method that we can use to run multiple isolated processes on a single host machine. Containers allow applications to run in an isolated environment with their own dependencies, libraries, and configuration files without the overhead of a full virtual machine (VM), which makes them lighter and more efficient.

If we compare containers to traditional VMs, they differ in a few ways. VMs virtualize at the hardware level, creating a full virtual operating system. Containers, on the other hand, virtualize at the operating system level. Because of that, containers share the host system’s kernel, whereas VMs each have their own kernel. This allows containers to have much faster startup times, typically in milliseconds compared to minutes for VMs (it is worth noting that in a Linux environment, Docker can leverage the capabilities of a Linux kernel directly. While running in a Windows system, however, it...

Installing Docker

To get started with Docker, you can install it by using the package manager for your Linux distribution or install Docker Desktop for Mac/Windows machines.

Windows

To use Docker Desktop on Windows, you must turn on the WSL 2 feature. Refer to this link for detailed instructions: https://docs.microsoft.com/en-us/windows/wsl/install-win10.

After that, you can install Docker Desktop as follows:

  1. Go to https://www.docker.com/products/docker-desktop and download the installer.
  2. When the download is ready, double-click the installer and follow the prompts.

    You should ensure that the Use WSL 2 instead of Hyper-V option is selected on the Configuration page. This is the recommended usage. (If your system does not support WSL 2, this option will not be available. You can still run Docker with Hyper-V, though.)

  3. After the installation is finished, close to complete and start Docker Desktop.

If you have any doubts, refer to the official documentation...

Getting started with Docker images

The very first Docker image we can run is the hello-world image. It is often used to test whether Docker is correctly installed and running.

hello-world

After the installation, open the terminal (Command Prompt in Windows) and run the following:

$ docker run hello-world

This command will pull the hello-world image from the Docker Hub public repository and run the application in it. If you can run it successfully, you will see this output:

Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
70f5ac315c5a: Pull complete
Digest: sha256:88ec0acaa3ec199d3b7eaf73588f4518c25 f9d34f58ce9a0df68429c5af48e8d
Status: Downloaded newer image for hello-world:latest
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the &quot...

Building your own image

Now, we will customize our own container images for running a simple data processing job and an API service.

Batch processing job

Here is a simple Python code for a batch processing job:

run.py

import pandas as pd
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv'
df = pd.read_csv(url, header=None)
df["newcolumn"] = df[5].apply(lambda x: x*2)
print(df.columns)
print(df.head())
print(df.shape)

This Python code loads a CSV dataset from a URL into a pandas DataFrame, adding a new column by multiplying an existing column by 2 and then printing out some information about the DataFrame (column names, first five rows, and size of the DataFrame). Type this code using your favorite code editor and save the file with the name run.py.

Normally, we test our code locally (whenever possible) to be sure it is working. To do that, first, you need to install the pandas library:

pip3 install...

Summary

In this chapter, we covered the fundamentals of containers and how to build and run them using Docker. Containers provide a lightweight and portable way to package applications and their dependencies so they can run reliably across environments.

You learned about key concepts such as images, containers, Dockerfiles, and registries. We installed Docker and ran simple containers such as NGINX and Julia to get hands-on experience. You built your own containers for a batch processing job and API service, defining Dockerfiles to package dependencies.

These skills allow you to develop applications and containerize them for smooth deployment anywhere. Containers are super useful as they ensure your software runs exactly as intended every time.

In the next chapter, we will look at orchestrating containers using Kubernetes to easily scale, monitor, and manage containerized applications. We will take a look at the most important Kubernetes concepts and components and learn how...

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Leverage Kubernetes in a cloud environment to integrate seamlessly with a variety of tools
  • Explore best practices for optimizing the performance of big data pipelines
  • Build end-to-end data pipelines and discover real-world use cases using popular tools like Spark, Airflow, and Kafka
  • Purchase of the print or Kindle book includes a free PDF eBook

Description

In today's data-driven world, organizations across different sectors need scalable and efficient solutions for processing large volumes of data. Kubernetes offers an open-source and cost-effective platform for deploying and managing big data tools and workloads, ensuring optimal resource utilization and minimizing operational overhead. If you want to master the art of building and deploying big data solutions using Kubernetes, then this book is for you. Written by an experienced data specialist, Big Data on Kubernetes takes you through the entire process of developing scalable and resilient data pipelines, with a focus on practical implementation. Starting with the basics, you’ll progress toward learning how to install Docker and run your first containerized applications. You’ll then explore Kubernetes architecture and understand its core components. This knowledge will pave the way for exploring a variety of essential tools for big data processing such as Apache Spark and Apache Airflow. You’ll also learn how to install and configure these tools on Kubernetes clusters. Throughout the book, you’ll gain hands-on experience building a complete big data stack on Kubernetes. By the end of this Kubernetes book, you’ll be equipped with the skills and knowledge you need to tackle real-world big data challenges with confidence.

Who is this book for?

If you’re a data engineer, BI analyst, data team leader, data architect, or tech manager with a basic understanding of big data technologies, then this big data book is for you. Familiarity with the basics of Python programming, SQL queries, and YAML is required to understand the topics discussed in this book.

What you will learn

  • Install and use Docker to run containers and build concise images
  • Gain a deep understanding of Kubernetes architecture and its components
  • Deploy and manage Kubernetes clusters on different cloud platforms
  • Implement and manage data pipelines using Apache Spark and Apache Airflow
  • Deploy and configure Apache Kafka for real-time data ingestion and processing
  • Build and orchestrate a complete big data pipeline using open-source tools
  • Deploy Generative AI applications on a Kubernetes-based architecture

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jul 19, 2024
Length: 296 pages
Edition : 1st
Language : English
ISBN-13 : 9781835468999
Category :
Languages :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Jul 19, 2024
Length: 296 pages
Edition : 1st
Language : English
ISBN-13 : 9781835468999
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 139.97
Big Data on Kubernetes
$39.99
Atlassian DevOps Toolchain Cookbook
$44.99
Modern Python Cookbook
$54.99
Total $ 139.97 Stars icon
Banner background image

Table of Contents

17 Chapters
Part 1:Docker and Kubernetes Chevron down icon Chevron up icon
Chapter 1: Getting Started with Containers Chevron down icon Chevron up icon
Chapter 2: Kubernetes Architecture Chevron down icon Chevron up icon
Chapter 3: Getting Hands-On with Kubernetes Chevron down icon Chevron up icon
Part 2: Big Data Stack Chevron down icon Chevron up icon
Chapter 4: The Modern Data Stack Chevron down icon Chevron up icon
Chapter 5: Big Data Processing with Apache Spark Chevron down icon Chevron up icon
Chapter 6: Building Pipelines with Apache Airflow Chevron down icon Chevron up icon
Chapter 7: Apache Kafka for Real-Time Events and Data Ingestion Chevron down icon Chevron up icon
Part 3: Connecting It All Together Chevron down icon Chevron up icon
Chapter 8: Deploying the Big Data Stack on Kubernetes Chevron down icon Chevron up icon
Chapter 9: Data Consumption Layer Chevron down icon Chevron up icon
Chapter 10: Building a Big Data Pipeline on Kubernetes Chevron down icon Chevron up icon
Chapter 11: Generative AI on Kubernetes Chevron down icon Chevron up icon
Chapter 12: Where to Go from Here Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.