Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Big Data on Kubernetes
Big Data on Kubernetes

Big Data on Kubernetes : A practical guide to building efficient and scalable data solutions

Arrow left icon
Profile Icon Neylson Crepalde
Arrow right icon
$39.99
Paperback Jul 2024 296 pages 1st Edition
eBook
$9.99 $31.99
Paperback
$39.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Neylson Crepalde
Arrow right icon
$39.99
Paperback Jul 2024 296 pages 1st Edition
eBook
$9.99 $31.99
Paperback
$39.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$9.99 $31.99
Paperback
$39.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

Big Data on Kubernetes

Getting Started with Containers

The world is rapidly generating massive amounts of data from a variety of sources – mobile devices, social media, e-commerce transactions, sensors, and more. This data explosion is often referred to as “big data.” While big data presents immense opportunities for businesses and organizations to gain valuable insights, it also brings tremendous complexity in how to store, process, analyze, and extract value from huge volumes of diverse data.

This is where Kubernetes comes in. Kubernetes is an open source container orchestration system that helps automate the deployment, scaling, and management of containerized applications. Kubernetes brings important advantages for building big data systems. It provides a standard way to deploy containerized big data applications on any infrastructure. This makes it easy to migrate applications across on-premises servers or cloud providers. It also makes it simple to scale big data applications...

Technical requirements

For this chapter, you should have Docker installed. Also, a computer with a minimum of 4 GB of RAM (8 GB is recommended) is required, as Docker can really consume a computer’s memory.

The code for this chapter is available on GitHub. Please refer to https://github.com/PacktPublishing/Bigdata-on-Kubernetes and access the Chapter01 folder.

Container architecture

Containers are an operating system-level virtualization method that we can use to run multiple isolated processes on a single host machine. Containers allow applications to run in an isolated environment with their own dependencies, libraries, and configuration files without the overhead of a full virtual machine (VM), which makes them lighter and more efficient.

If we compare containers to traditional VMs, they differ in a few ways. VMs virtualize at the hardware level, creating a full virtual operating system. Containers, on the other hand, virtualize at the operating system level. Because of that, containers share the host system’s kernel, whereas VMs each have their own kernel. This allows containers to have much faster startup times, typically in milliseconds compared to minutes for VMs (it is worth noting that in a Linux environment, Docker can leverage the capabilities of a Linux kernel directly. While running in a Windows system, however, it...

Installing Docker

To get started with Docker, you can install it by using the package manager for your Linux distribution or install Docker Desktop for Mac/Windows machines.

Windows

To use Docker Desktop on Windows, you must turn on the WSL 2 feature. Refer to this link for detailed instructions: https://docs.microsoft.com/en-us/windows/wsl/install-win10.

After that, you can install Docker Desktop as follows:

  1. Go to https://www.docker.com/products/docker-desktop and download the installer.
  2. When the download is ready, double-click the installer and follow the prompts.

    You should ensure that the Use WSL 2 instead of Hyper-V option is selected on the Configuration page. This is the recommended usage. (If your system does not support WSL 2, this option will not be available. You can still run Docker with Hyper-V, though.)

  3. After the installation is finished, close to complete and start Docker Desktop.

If you have any doubts, refer to the official documentation...

Getting started with Docker images

The very first Docker image we can run is the hello-world image. It is often used to test whether Docker is correctly installed and running.

hello-world

After the installation, open the terminal (Command Prompt in Windows) and run the following:

$ docker run hello-world

This command will pull the hello-world image from the Docker Hub public repository and run the application in it. If you can run it successfully, you will see this output:

Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
70f5ac315c5a: Pull complete
Digest: sha256:88ec0acaa3ec199d3b7eaf73588f4518c25 f9d34f58ce9a0df68429c5af48e8d
Status: Downloaded newer image for hello-world:latest
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the &quot...

Building your own image

Now, we will customize our own container images for running a simple data processing job and an API service.

Batch processing job

Here is a simple Python code for a batch processing job:

run.py

import pandas as pd
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv'
df = pd.read_csv(url, header=None)
df["newcolumn"] = df[5].apply(lambda x: x*2)
print(df.columns)
print(df.head())
print(df.shape)

This Python code loads a CSV dataset from a URL into a pandas DataFrame, adding a new column by multiplying an existing column by 2 and then printing out some information about the DataFrame (column names, first five rows, and size of the DataFrame). Type this code using your favorite code editor and save the file with the name run.py.

Normally, we test our code locally (whenever possible) to be sure it is working. To do that, first, you need to install the pandas library:

pip3 install...

Summary

In this chapter, we covered the fundamentals of containers and how to build and run them using Docker. Containers provide a lightweight and portable way to package applications and their dependencies so they can run reliably across environments.

You learned about key concepts such as images, containers, Dockerfiles, and registries. We installed Docker and ran simple containers such as NGINX and Julia to get hands-on experience. You built your own containers for a batch processing job and API service, defining Dockerfiles to package dependencies.

These skills allow you to develop applications and containerize them for smooth deployment anywhere. Containers are super useful as they ensure your software runs exactly as intended every time.

In the next chapter, we will look at orchestrating containers using Kubernetes to easily scale, monitor, and manage containerized applications. We will take a look at the most important Kubernetes concepts and components and learn how...

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Leverage Kubernetes in a cloud environment to integrate seamlessly with a variety of tools
  • Explore best practices for optimizing the performance of big data pipelines
  • Build end-to-end data pipelines and discover real-world use cases using popular tools like Spark, Airflow, and Kafka
  • Purchase of the print or Kindle book includes a free PDF eBook

Description

In today's data-driven world, organizations across different sectors need scalable and efficient solutions for processing large volumes of data. Kubernetes offers an open-source and cost-effective platform for deploying and managing big data tools and workloads, ensuring optimal resource utilization and minimizing operational overhead. If you want to master the art of building and deploying big data solutions using Kubernetes, then this book is for you. Written by an experienced data specialist, Big Data on Kubernetes takes you through the entire process of developing scalable and resilient data pipelines, with a focus on practical implementation. Starting with the basics, you’ll progress toward learning how to install Docker and run your first containerized applications. You’ll then explore Kubernetes architecture and understand its core components. This knowledge will pave the way for exploring a variety of essential tools for big data processing such as Apache Spark and Apache Airflow. You’ll also learn how to install and configure these tools on Kubernetes clusters. Throughout the book, you’ll gain hands-on experience building a complete big data stack on Kubernetes. By the end of this Kubernetes book, you’ll be equipped with the skills and knowledge you need to tackle real-world big data challenges with confidence.

Who is this book for?

If you’re a data engineer, BI analyst, data team leader, data architect, or tech manager with a basic understanding of big data technologies, then this big data book is for you. Familiarity with the basics of Python programming, SQL queries, and YAML is required to understand the topics discussed in this book.

What you will learn

  • Install and use Docker to run containers and build concise images
  • Gain a deep understanding of Kubernetes architecture and its components
  • Deploy and manage Kubernetes clusters on different cloud platforms
  • Implement and manage data pipelines using Apache Spark and Apache Airflow
  • Deploy and configure Apache Kafka for real-time data ingestion and processing
  • Build and orchestrate a complete big data pipeline using open-source tools
  • Deploy Generative AI applications on a Kubernetes-based architecture
Estimated delivery fee Deliver to United States

Economy delivery 10 - 13 business days

Free $6.95

Premium delivery 6 - 9 business days

$21.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jul 19, 2024
Length: 296 pages
Edition : 1st
Language : English
ISBN-13 : 9781835462140
Category :
Languages :
Concepts :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to United States

Economy delivery 10 - 13 business days

Free $6.95

Premium delivery 6 - 9 business days

$21.95
(Includes tracking information)

Product Details

Publication date : Jul 19, 2024
Length: 296 pages
Edition : 1st
Language : English
ISBN-13 : 9781835462140
Category :
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 139.97
Big Data on Kubernetes
$39.99
Atlassian DevOps Toolchain Cookbook
$44.99
Modern Python Cookbook
$54.99
Total $ 139.97 Stars icon
Banner background image

Table of Contents

17 Chapters
Part 1:Docker and Kubernetes Chevron down icon Chevron up icon
Chapter 1: Getting Started with Containers Chevron down icon Chevron up icon
Chapter 2: Kubernetes Architecture Chevron down icon Chevron up icon
Chapter 3: Getting Hands-On with Kubernetes Chevron down icon Chevron up icon
Part 2: Big Data Stack Chevron down icon Chevron up icon
Chapter 4: The Modern Data Stack Chevron down icon Chevron up icon
Chapter 5: Big Data Processing with Apache Spark Chevron down icon Chevron up icon
Chapter 6: Building Pipelines with Apache Airflow Chevron down icon Chevron up icon
Chapter 7: Apache Kafka for Real-Time Events and Data Ingestion Chevron down icon Chevron up icon
Part 3: Connecting It All Together Chevron down icon Chevron up icon
Chapter 8: Deploying the Big Data Stack on Kubernetes Chevron down icon Chevron up icon
Chapter 9: Data Consumption Layer Chevron down icon Chevron up icon
Chapter 10: Building a Big Data Pipeline on Kubernetes Chevron down icon Chevron up icon
Chapter 11: Generative AI on Kubernetes Chevron down icon Chevron up icon
Chapter 12: Where to Go from Here Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela