You're reading from Building Serverless Microservices in Python A complete guide to building, testing, and deploying microservices using serverless computing on AWS

Product type Paperback

Published in Mar 2019

Publisher Packt

ISBN-13 9781789535297

Length 168 pages

Edition 1st Edition

Languages

Python

Tools

AWS

Concepts

Microservices

Author (1):

Richard Takashi Freeman

View More author details

Serverless computing in AWS

Serverless computing in AWS allows you to quickly deploy event-driven computing in the cloud. With serverless computing, there are still servers but you don't have the manage them. AWS automatically manages all the computing resources for you, as well as any trigger mechanisms. For example, when an object gets written to a bucket, that would trigger an event. If another service writes a new record to an Amazon DynamoDB table, that could trigger an event or an endpoint to be called.

The main idea of using event-driven computing is that it easily allows you to transform data as it arrives into the cloud, or we can perform data-driven auditing analysis notifications, transformations, or parse Internet of Things (IoT) device events. Serverless also means that you don't need to have an always-on running service in order to do that, you can actually trigger it based on the event.

Overview of some of the key serverless services in AWS

Some key serverless services in AWS are explained in the following list:

Amazon Simple Storage Service (S3): A distributed web-scale object store that is highly scalable, highly secure, and reliable. You only pay for the storage that you actually consume, which makes it beneficial in terms of pricing. It also supports encryption, where you can provide your own key or you can use a server-side encryption key provided by AWS.

Amazon DynamoDB: A fully-managed NoSQL store database service that is managed by AWS and allows you to focus on writing the data out to the data store. It's highly durable and available. It has been used in gaming and other high-performance activities, which require low latency. It uses SSD storage under the hood and also provides partitioning for high availability.
Amazon Simple Notification Service (SNS): A push-notification service that allows you to send notifications to other subscribers. These subscribers could be email addresses, SNS messages, or other queues. The messages would get pushed to any subscriber to the SNS service.
Amazon Simple Queue Service (SQS): A fully-managed and scalable distributed message queue that is highly available and durable. SQS queues are often subscribed to SNS topics to implement the distributed publish-subscribe pattern. You pay for what you use based on the number of requests.
AWS Lambda: The main idea is you write your business logic code and it gets triggered based on the event sources you configure. The beauty is that you only pay for when the code is actually executed, down to the 100 milliseconds. It automatically scales and is highly available. It is one of the key components to the AWS serverless ecosystem.
Amazon API Gateway: A managed API service that allows you to build, publish, and manage APIs. It performs at scale and allows you to also perform caching, traffic throttling, and caching in edge locations, which means they're localized based on where the user is located, minimizing overall latency. In addition, it integrates natively with AWS Lambda functions, allowing you to focus on the core business logic code to parse that request or data.
AWS Identity and Access Management (IAM): The central component of all security is IAM roles and policies, which are basically a mechanism that's managed by AWS for centralizing security and federating it to other services. For example, you can restrict a Lambda to only read a specific DynamoDB table, but not have the ability to write to the same DynamoDB table or deny read/write access any other tables.
Amazon CloudWatch: A central system for monitoring services. You can, for example, monitor the utilization of various resources, record custom metrics, and host application logs. It is also very useful for creating rules that trigger a notification when specific events or exceptions occur.

AWS X-Ray: A service that allows you to trace service requests and analyze latency and traces from various sources. It also generates service maps, so you can see the dependency and where the most time is spent in a request, and do root cause analysis of performance issues and errors.

Amazon Kinesis Streams: A steaming service that allows you to capture millions of events per second that you can analyze further downstream. The main idea is you would have, for example, thousands of IoT devices writing directly to Kinesis Streams, capturing that data in one pipe, and then analyzing it with different consumers. If the number of events goes up and you need more capacity, you can simply add more shards, each with a capacity of 1,000 writes per second. It's simple to add more shards as there is no downtime, and they don't interrupt the event capture.
Amazon Kinesis Firehose: A system that allows you to persist and load streaming data. It allows you to write to an endpoint that would buffer up the events in memory for up to 15 minutes, and then write it into S3. It supports massive volumes of data and also integrates with Amazon Redshift, which is a data warehouse in the cloud. It also integrates with the Elasticsearch service, which allows you to query free text, web logs, and other unstructured data.
Amazon Kinesis Analytics: Allows you to analyze data that is in Kinesis Streams using structured query language (SQL). It also has the ability to discover the data schema so that you can use SQL statements on the stream. For example, if you're capturing web analytics data, you could count the daily page view data and aggregate them up by specific pageId.
Amazon Athena: A service that allows you to directly query S3 using a schema on read. It relies on the AWS Glue Data Catalog to store the table schemas. You can create a table and then query the data straight off S3, there's no spin-up time, it's serverless, and allows you to explore massive datasets in a very flexible and cost-effective manner.

Among all these services, AWS Lambda is the most widely used serverless service in AWS. We will discuss more about that in the next section.

AWS Lambda

The key serverless component in AWS is called AWS Lambda. A Lambda is basically some business logic code that can be triggered by an event source:

A data event source could be the put or get of an object to an S3 bucket. Streaming event sources could be new records that have been to a DynamoDB table that trigger a Lambda function. Other streaming event sources include Kinesis Streams and SQS.

One example of requests to endpoints are Alexa skills, from Alexa echo devices. Another popular one is Amazon API Gateway, when you call an endpoint that would invoke a Lambda function. In addition, you can use changes in AWS CodeCommit or Amazon Cloud Watch.

Finally, you can trigger different events and messages based on SNS or different cron events. These would be regular events or they could be notification events.

The main idea is that the integration between the event source and the Lambda is managed fully by AWS, so all you need to do is write the business logic code, which is the function itself. Once you have the function running, you can run either a transformation or some business logic code to actually write to other services on the right of the diagram. These could be data stores or invoke other endpoints.

In the serverless world, you can implement sync/asyc requests, messaging or event stream processing much more easily using AWS Lambdas. This includes the microservice communication style and data-management patterns we just talked about.

Lambda has two types of event sources types, non-stream event sources and stream event sources:

Non-stream event sources: Lambdas can be invoked asynchronously or synchronously. For example, SNS/S3 are asynchronous but API Gateway is sync. For sync invocations, the client is responsible for retries, but for async it will retry many times before sending it to a Dead Letter Queue (DLQ) if configured. It's great to have this retry logic and integration built in and supported by AWS event sources, as it means less code and a simpler architecture:

Stream event sources: The Lambda is invoked with micro-batches of data. In terms of concurrency, there is one Lambda invoked in parallel per shard for Kinesis Streams or one Lambda per partition for DynamoDB Stream. Within the lambda, you just need to iterate over the Kinesis Streams, DynamoDB, or SQS data passed in as JSON records. In addition, you benefit from the AWS built-in streams integration where the Lambda will poll the stream and retrieve the data in order, and will retry upon failure until the data expires, which can be up to seven days for Kinesis Streams. It's also great to have that retry logic built in without having to write a line of code. It is much more effort if you had to build it as a fleet of EC2 or containers using the AWS Consumer or Kinesis SDK yourself:

In essence, AWS is responsible for the invocation and passing in the event data to the Lambda, you are responsible for the processing and the response of the Lambda.

Serverless computing to implement microservice patterns

Here is an overview diagram of some of the serverless and managed services available on AWS:

Leveraging AWS-managed services does mean additional vendor lock-in but helps you reduce non business differentiating support and maintenance costs. But also to deploy your applications faster as the infrastructure can be provisioned or destroyed in minutes. In some cases, when using AWS-managed services to implement microservices patterns, there is no need for much code, only configuration.

We have services for the following:

Events, messaging, and notifications: For async publish/subscribe and coordinating components
API and web: To create APIs for your serverless microservices and expose it to the web
Data and analytics: To store, share, and analyze your data
Monitoring: Making sure your microservices and stack are operating correctly
Authorization and security: To ensure that your services and data is secure, and only accessed by those authorized

At the center is AWS Lambda, the glue for connecting services, but also one of the key places for you to deploy your business logic source code.

Example use case – serverless file transformer

Here is an example use case, to give you an idea of how different managed AWS systems can fit together as a solution. The requirements are that a third-party vendor is sending us a small 10 MB file daily at random times, and we need to transform the data and write it to a NoSQL database so it can be queried in real time. If there are any issues with the third-party data, we want to send an alert within a few minutes. Your boss tells you that you they don't want to have an always-on machine just for this task, the third party has no API development experience, and there is a limited budget. The head of security also finds out about this project and adds another constraint. They don't want to give third-party access to your AWS account beyond one locked-down S3 bucket:

This can be implemented as an event-driven serverless stack. On the left, we have an S3 bucket where the third party has access to drop their file. When a new object is created, that triggers a Lambda invocation via the built-in event source mapping. The Lambda executes code to transform the data, for example, extracts key records such as user_id, date, and event type from the object, and writes them to a DynamoDB table. The Lambda sends summary custom metrics of the transformation, such as number of records transformed and written to CloudWatch metrics. In addition, if there are transformation errors, the Lambda sends an SNS notification with a summary of the transformation issues, which could generate an email to the administrator and third-party provider for them to investigate the issue.