Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Scalable Data Streaming with Amazon Kinesis

You're reading from   Scalable Data Streaming with Amazon Kinesis Design and secure highly available, cost-effective data streaming applications with Amazon Kinesis

Arrow left icon
Product type Paperback
Published in Mar 2021
Publisher Packt
ISBN-13 9781800565401
Length 314 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (4):
Arrow left icon
Rajeev Chakrabarti Rajeev Chakrabarti
Author Profile Icon Rajeev Chakrabarti
Rajeev Chakrabarti
Tarik Makota Tarik Makota
Author Profile Icon Tarik Makota
Tarik Makota
Brian Maguire Brian Maguire
Author Profile Icon Brian Maguire
Brian Maguire
Danny Gagne Danny Gagne
Author Profile Icon Danny Gagne
Danny Gagne
Arrow right icon
View More author details
Toc

Table of Contents (13) Chapters Close

Preface 1. Section 1: Introduction to Data Streaming and Amazon Kinesis
2. Chapter 1: What Are Data Streams? FREE CHAPTER 3. Chapter 2: Messaging and Data Streaming in AWS 4. Chapter 3: The SmartCity Bike-Sharing Service 5. Section 2: Deep Dive into Kinesis
6. Chapter 4: Kinesis Data Streams 7. Chapter 5: Kinesis Firehose 8. Chapter 6: Kinesis Data Analytics 9. Chapter 7: Amazon Kinesis Video Streams 10. Section 3: Integrations
11. Chapter 8: Kinesis Integrations 12. Other Books You May Enjoy

Introducing data streams

Data streams are a way of storing a sequence of messages. They enable us to design systems where we think about state as a series of events instead of only entities and values, or rows and columns in a database. This shift in mindset and technology enables real-time analytics to extract the value from data by acting on it before it is stale. They also enable organizations to design and develop resilient software based on microservice architectures by helping them to decouple systems. We will begin with an overview of streaming data sources, why real-time data analysis is valuable, and how they can be used architecturally to decouple systems. We will then review the core challenges associated with distributed systems, and conclude with an overview of key messaging concepts and some high-level examples. Messages can contain a wide variety of information and come from different sources, so let's look at the primary sources and data formats.

Sources of data

The proliferation of data steadily increases from sources such as social media, IoT devices, web clickstreams, application logs, and video cameras. This data poses challenges to most systems, since it is typically high-velocity, intermittent, and bursty, making it difficult to adequately provision and design downstream systems. Payloads are generally small, except when containing audio or video data, and come in a variety of formats.

In this book, we will be focusing on three data formats. These formats include the following:

  • JavaScript Object Notation (JSON)
  • Log files
  • Time-encoded binary files such as video

JSON streams

JSON has become the dominant format for message serialization over the past 10 years. It is a lightweight data interchange format that is easy for humans to read and write and is based on the JavaScript object syntax. It has two data structures – hash tables and lists. A hash table consists of key-value pairs, {"key":"value"}, where the keys must be unique. A list is a set of values in a specific order, ["value 1", "value 2"]. The following code sample shows a sample IoT JSON message:

{
    "deviceid" : "device001",
    "eventTime": -192778200,
    "temp" : 68.4,
    "humidity" : 77.3,
    "coords" : {
        "latitude" : 32.779039,
        "longitude" : -96.808660
    }
}

Log file streams

Log files come in a variety of formats. Common ones include Apache Commons Logging, Apache Combined Log, Apache Error Log, and RFC3164 Syslog. They are plain text, and usually each line, delineated by a newline ('\n') character, is a separate log entry. In the following sample log, we see an HTTP GET request where the IP address is 10.13.37.01, the datetime of the request, the HTTP verb, the URL fragment, the HTTP version, the response code, and the size of the result.

The sample log line in Apache Commons Logging format is as follows:

10.13.37.01 - - [03/Sep/2017:12:00:01 +0830] "GET /mailman/listinfo/test HTTP/1.1" 200 2457

Time-encoded binary streams

Time-encoded binary streams consist of a time series of records where each record is related to the adjacent records (prior and subsequent records). These can be used for a wide variety of sensor data, from audio streams and RADAR signals to video streams. Throughout this book, the primary focus will be video streams and their applications.

Figure 1.1 – Time-encoded video data

Figure 1.1 – Time-encoded video data

As shown in Figure 1.1, video streams are composed of fragments, where each fragment is a self-contained sequence of media frames. There are no dependencies between fragments. We will discuss video streams in more detail in Chapter 7, Kinesis Video Streams. Now that we've covered the types of data that we'll be processing, let's take a step back to understand the value of real-time data in analytics.

You have been reading a chapter from
Scalable Data Streaming with Amazon Kinesis
Published in: Mar 2021
Publisher: Packt
ISBN-13: 9781800565401
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image