Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Natural Language Processing with AWS AI Services
Natural Language Processing with AWS AI Services

Natural Language Processing with AWS AI Services: Derive strategic insights from unstructured data with Amazon Textract and Amazon Comprehend

Arrow left icon
Profile Icon M Profile Icon Premkumar Rangarajan
Arrow right icon
$29.99 $43.99
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (21 Ratings)
eBook Nov 2021 508 pages 1st Edition
eBook
$29.99 $43.99
Paperback
$54.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon M Profile Icon Premkumar Rangarajan
Arrow right icon
$29.99 $43.99
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (21 Ratings)
eBook Nov 2021 508 pages 1st Edition
eBook
$29.99 $43.99
Paperback
$54.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$29.99 $43.99
Paperback
$54.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
Table of content icon View table of contents Preview book icon Preview Book

Natural Language Processing with AWS AI Services

Chapter 1: NLP in the Business Context and Introduction to AWS AI Services

Natural language processing, or NLP, is quite popular in the scientific community, but the value of using this Artificial Intelligence (AI) technique to gain business benefits is not immediately obvious to mainstream users. Our focus will be to raise awareness and educate you on the business context of NLP, provide examples of the proliferation of data in unstructured text, and show how NLP can help derive meaningful insights to inform strategic decisions within an enterprise.

In this introductory chapter, we will be establishing the basic context to familiarize you with some of the underlying concepts of AI and Machine Learning (ML), the types of challenges that NLP can help solve, common pitfalls when building NLP solutions, and how NLP works and what it's really good at doing, with examples.

In this chapter, we will cover the following:

  • Introducing NLP
  • Overcoming the challenges in building NLP solutions
  • Understanding why NLP is becoming mainstream
  • Introducing the AWS ML stack

Introducing NLP

Language is as old as civilization itself and no other communication tool is as effective as the spoken or written word. In their childhood days, the authors were enamored with The Arabian Nights, a centuries-old collection of stories from India, Persia, and Arabia. In one famous story, Ali Baba and the Forty Thieves, Ali Baba is a poor man who discovers a thieves' den containing hordes of treasure hidden in a cave that can only be opened by saying the magic words open sesame. In the authors' experience, this was the first recollection of a voice-activated application. Though purely a work of fiction, it was indeed an inspiration to explore the art of the possible.

Recently, in the last two decades, the popularity of the internet and the proliferation of smart devices has fueled significant technological advancements in digital communications. In parallel, the long-running research to develop AI made rapid strides with the advent of ML. Arthur Lee Samuel was the first to coin the term machine learning, in 1959, and helped make it mainstream in the field of computer science by creating a checkers playing program that demonstrated how computers can be taught.

The concept that machines can be taught to mimic human cognition, though, was popularized a little earlier in 1950 by Alan Turing in his paper Computing Machinery and Intelligence. This paper introduced the Turing Test, a variation of a common party game of the time. The purpose of the test was for an interpreter to ask questions and compare responses from a human participant and a computer. The trick was that the interpreter was not aware which was which, considering all three were isolated in different rooms. If the interpreter was unable to differentiate the two participants because the responses matched closely, the Turing Test had successfully validated that the computer possessed AI.

Of course, the field of AI has progressed leaps and bounds since then, largely due to the success of ML algorithms in solving real-world problems. An algorithm, at its simplest, is a programmatic function that converts inputs to outputs based on conditions. In contradiction to regular programmable algorithms, ML algorithms have learned the ability to alter their processing based on the data they encounter. There are different ML algorithms to choose from based on requirements, for example, Extreme Gradient Boosting (XGBoost), a popular algorithm for regression and classification problems, Exponential Smoothing (ETS), for statistical time series forecasting, Single Shot MultiBox Detector (SSD), for computer vision problems, and Latent Dirichlet Allocation (LDA), for topic modeling in NLP problems.

For more complex problems, ML has evolved into deep learning with the introduction of Artificial Neural Networks (ANNs), which have the ability to solve highly challenging tasks by learning from massive volumes of data. For example, AWS DeepComposer (https://aws.amazon.com/deepcomposer/), an ML service from Amazon Web Services (AWS), educates developers with music as a medium of instruction. One of the ML models that DeepComposer uses is trained with a type of neural network called the Convolutional Neural Network (CNN) to create new and unique musical compositions from a simple input melody using AutoRegressive (AR) techniques:

Figure 1.1 – Composing music with AWS DeepComposer and ML

Figure 1.1 – Composing music with AWS DeepComposer and ML

A piano roll is an image representation of music, and AR-CNN considers music generation as a sequence of these piano roll images:

Figure 1.2 – Piano roll representation of music

Figure 1.2 – Piano roll representation of music

While there is broad adoption of ML across organizations of all sizes and industries spurred by the democratization of advanced technologies, the potential to solve many types of problems, and the breadth and depth of capabilities in AWS, ML is only a subset of what is possible today with AI. According to one report (https://www.gartner.com/en/newsroom/press-releases/2019-01-21-gartner-survey-shows-37-percent-of-organizations-have, accessed on March 23, 2021), AI adoption grew by 270% in the period 2015 to 2019. And it is continuing to grow at a rapid pace. AI is no longer a peripheral technology only available to those enterprises that have the economic resources to afford high-performance computers. Today, AI is a mainstream option for organizations looking to add cognitive intelligence to their applications to accelerate business value. For example, ExxonMobil in partnership with Amazon created an innovative and efficient way for customers to pay at gas stations. The Alexa pay for gas skill uses the car's Alexa-enabled device or your smartphone's Alexa app to communicate with the gas pump to manage the payment. The authors paid a visit to a local ExxonMobil gas station to try it out, and it was an awesome experience. For more details, please refer to https://www.exxon.com/en/amazon-alexa-pay-for-gas.

AI addresses a broad spectrum of tasks similar to human intelligence, both sensory and cognitive. Typically, these are grouped into categories, for example, computer vision (mimics human vision), NLP (mimics human speech, writing, and auditory processes), conversational interfaces (such as chatbots, mimics dialogue-based interactions), and personalization (mimics human intuition). For example, C-SPAN, a broadcaster that reports on proceedings at the US Senate and the House of Representatives, uses Amazon Rekognition (a computer vision-based image and video analysis service) to tag who is speaking/on camera at each time. With Amazon Rekognition, C-SPAN was able to index twice as much content compared to what they were doing previously. In addition, AWS offers AI services for intelligent search, forecasting, fraud detection, anomaly detection, predictive maintenance, and much more, which is why AWS was named the leader in the first Gartner Magic Quadrant for Cloud AI.

While language is inherently structured and well defined, the usage or interpretation of language is subjective, and may inadvertently cause an unintended influence that you need to be cognizant of when building natural language solutions. Consider, for example, the Telephone Game, which shows how conversations are involuntarily embellished, resulting in an entirely different version compared to how it began. Each participant repeats exactly what they think they heard, but not what they actually heard. It is fun when played as a party game but may have more serious repercussions in real life. Computers, too, will repeat what they heard, based on how their underlying ML model interprets language.

To understand how small incremental changes can completely change the meaning, let's look at another popular game: Word Ladder (https://en.wikipedia.org/wiki/Word_ladder). The objective is to convert one word into a different word, often one with the opposite meaning, in as few steps as possible with only one letter in the word changing in one step.

An example is illustrated in the following table:

Figure 1.3 – The Word Ladder game

Figure 1.3 – The Word Ladder game

Adapting AI to work with natural language resulted in a group of capabilities that primarily deal with computational emulation of cognitive and sensory processes associated with human speech and text. There are two main categories that applications can be grouped into:

  • Natural Language Understanding (NLU), for voice-based applications such as Amazon Alexa, and speech-to-text/text-to-speech conversions
  • NLP, for the interpretation of context-based insights from text

With NLU, applications that hitherto needed multiple and sometimes cumbersome interfaces, such as a screen, keyboard, and a mouse to enable computer-to-human interactions, can work as efficiently with just voice.

In Stanley Kubrick's 1968 movie 2001: A Space Odyssey, (spoiler alert!!) an artificially intelligent computer known as the HAL 9000 uses vision and voice to interact with the humans on board, and in the course of the movie, develops a personality, does not accept when it is in error, and attempts to kill the humans when it discovers their plot to shut it down. Fast forward to now, 20 years after the future depicted in the movie, and we have made significant progress in language understanding and processing, but not to the extreme extent of the dramatization of the plot elements used in the movie, thankfully.

Now that we have a good understanding of the context in which NLP has developed and how it can be used, let's try examining some of the common challenges you might face while developing NLP solutions.

Overcoming the challenges in building NLP solutions

We read earlier that the main difference between the algorithms used for regular programming and those used for ML is the ability of ML algorithms to modify their processing based on the input data fed to them. In the NLP context, as in other areas of ML, these differences add significant value and accelerate enterprise business outcomes. Consider, for example, a book publishing organization that needs to create an intelligent search capability displaying book recommendations to users based on topics of interest they enter.

In a traditional world, you would need multiple teams to go through the entire book collection, read books individually, identify keywords, phrases, topics, and other relevant information, create an index to associate book titles, authors, and genres to these keywords, and link this with the search capability. This is a massive effort that takes months or years to set up based on the size of the collection, the number of people, and their skill levels, and the accuracy of the index is prone to human error. As books are updated to newer editions, and new books are added or removed, this effort would have to be repeated incrementally. This is also a significant cost and time investment that may deter many unless that time and those resources have already been budgeted for.

To bring in a semblance of automation in our previous example, we need the ability to digitize text from documents. However, this is not the only requirement, as we are interested in deriving context-based insights from the books to power a recommendations index for a reader. And if we are talking about, for example, a publishing house such as Packt, with 7,500+ books in its collection, we need a solution that not only scales to process large numbers of pages, but also understands relationships in text, and provides interpretations based on semantics, grammar, word tokenization, and language to create smart indexes. We will cover a detailed walkthrough of this solution, along with code samples and demo videos, in Chapter 5, Creating NLP Search.

Today's enterprises are grappling with leveraging meaningful insights from their data primarily due to the pace at which it is growing. Until a decade or so, most organizations used relational databases for all their data management needs, and some still do even today. This was fine because the data volume need was in single-digit terabytes or less. In the last few years, the technology landscape has witnessed a significant upheaval with smartphones becoming ubiquitous, the large-scale proliferation of connected devices (in the billions), the ability to dynamically scale infrastructure in size and into new geographies, and storage and compute costs becoming cheaper due to the democratization offered by the cloud. All of this means applications get used more often, have much larger user bases, more processing power, and capabilities, can accelerate their pace of innovation with faster go-to-market cycles, and as a result, have a need to store and manage petabytes of data. This, coupled with application users demanding faster response times and higher throughput, has put a strain on the performance of relational databases, fueling a move toward purpose-built databases such as Amazon DynamoDB, a key-value and document database that delivers single-digit millisecond latency at any scale.

While this move signals a positive trend, what is more interesting is how enterprises utilize this data to gain strategic insights. After all, data is only as useful as the information we can glean from it. We see many organizations, while accepting the benefits of purpose-built tools, implementing these changes in silos. So, there are varying levels of maturity in properly harnessing the advantages of data. Some departments use an S3 data lake (https://aws.amazon.com/products/storage/data-lake-storage/) to source data from disparate sources and run ML to derive context-based insights, others are consolidating their data in purpose-built databases, while the rest are still using relational databases for all their needs.

You can see a basic explanation of the main components of a data lake in the following Figure 1.5, An example of an Amazon S3 data lake:

Figure 1.4 – An example of an Amazon S3 data lake

Figure 1.4 – An example of an Amazon S3 data lake

Let's see how NLP can continue to add business value in this situation by referring back to our book publishing example. Suppose we successfully built our smart indexing solution, and now we need to update it with book reviews received via Twitter feeds. The searchable index should provide book recommendations based on review sentiment (for example, don't recommend a book if reviews are negative > 50% in the last 3 months). Traditionally, business insights are generated by running a suite of reports on behemoth data warehouses that collect, mine, and organize data into marts and dimensions. A tweet may not even be under consideration as a data source. These days, things have changed and mining social media data is an important aspect of generating insights. Setting up business rules to examine every tweet is a time-consuming and compute-intensive task. Furthermore, since a tweet is unstructured text, a slight change in semantics may impact the effectiveness of the solution.

Now, if you consider model training, the infrastructure required to build accurate NLP models typically uses the deep learning architecture called Transformers (please see https://www.packtpub.com/product/transformers-for-natural-language-processing/9781800565791) that use sequence-to-sequence processing without needing to process the tokens in order, resulting in a higher degree of parallelization. Transformer model families use billions of parameters with the training architecture using clusters of instances for distributed learning, which adds to time and costs.

AWS offers AI services that allow you, with just a few lines of code, to add NLP to your applications for the sentiment analysis of unstructured text at an almost limitless scale and immediately take advantage of the immense potential waiting to be discovered in unstructured text. We will cover AWS AI services in more detail from Chapter 2, Introducing Amazon Textract, onward.

In this section, we reviewed some challenges organizations encounter when building NLP solutions, such as complexities in digitizing paper-based text, understanding patterns from structured and unstructured data, and how resource-intensive these solutions can be. Let's now understand why NLP is an important mainstream technology for enterprises today.

Understanding why NLP is becoming mainstream

According to this report (https://www.marketsandmarkets.com/Market-Reports/natural-language-processing-nlp-825.html, accessed on March 23, 2021), the global NLP market is expected to grow to USD 35.1 billion by 2026, at a Compound Annual Growth Rate (CAGR) of 20.3% during the forecast period. This is not surprising considering the impact ML is making across every industry (such as finance, retail, manufacturing, energy, utilities, real estate, healthcare, and so on) in organizations of every size, primarily driven by the advent of cloud computing and the economies of scale available.

This article about Emergence Cycle (https://blogs.gartner.com/anthony_bradley/2020/10/07/announcing-gartners-new-emergence-cycle-research-for-ai/), a research into emerging technologies in NLP (based on patents submitted, and looking at technology still in labs or recently released), shows the most mature usage of NLP is multimedia content analysis. This trend is true based on our experience, and content analysis to gain strategic insights is a common NLP requirement based on our discussions with a number of organizations across industries:

Figure 1.5 – Gartner's NLP Emergence Cycle 2020

Figure 1.5 – Gartner's NLP Emergence Cycle 2020

For example, in 2020, when the world was struggling with the effects of the pandemic, a number of organizations adopted AI and specifically NLP to power predictions on the virus spread patterns, assimilate knowledge on virus behavior and vaccine research, and monitor the effectiveness of safety measures, to name a few. In April 2020, AWS launched an NLP-powered search site called https://cord19.aws/ using an AWS AI service called Amazon Kendra (https://aws.amazon.com/kendra/). The site provides an easy interface to search the COVID-19 Open Research Dataset using natural language questions. As the dataset is constantly updated based on the latest research on COVID-19, CORD-19 Search, due to its support for NLP, makes it easy to navigate this ever-expanding collection of research documents and find precise answers to questions. The search results provide not only specific text that contains the answer to the question but also the original body of text in which these answers are located:

Figure 1.6 – CORD-19 search results

Figure 1.6 – CORD-19 search results

Fred Hutchinson Cancer Research Center is a research institute focused on curing cancer by 2025. Matthew Trunnell, Chief Information Officer of Fred Hutchinson Cancer Research Center, has said the following:

"The process of developing clinical trials and connecting them with the right patients requires research teams to sift through and label mountains of unstructured clinical record data. Amazon Comprehend Medical will reduce this time burden from hours to seconds. This is a vital step toward getting researchers rapid access to the information they need when they need it so they can find actionable insights to advance lifesaving therapies for patients."

For more details and usage examples of Amazon Comprehend and Amazon Comprehend Medical, please refer to Chapter 3, Introducing Amazon Comprehend.

So, how can AI and NLP help us cure cancer or prepare for a pandemic? It's about recognizing patterns where none seem to exist. Unstructured text, such as documents, social media posts, and email messages, is similar to the treasure waiting in Ali Baba's cave. To understand why, let's briefly look at how NLP works.

NLP models train by learning what are called word embeddings, which are vector representations of words in large collections of documents. These embeddings capture semantic relationships and word distributions in documents, thereby helping to map the context of a word based on its relationship to other words in the document. The two common training architectures for learning word embeddings are Skip-gram and Continuous Bag of Words (CBOW). In Skip-gram, the embeddings of the input word are used to derive the distribution of the related words to predict the context, and in CBOW, the embeddings of the related words are used to predict the word in the middle. Both are neural network-based architectures and work well for context-based analytics use cases.

Now that we understand the basics of NLP (analyzing patterns in text by converting words to their vector representations), when we look at training models using text data from disparate data sources, unique insights are often derived due to patterns that emerge that previously appeared hidden when looked at within a narrower context, because we are using numbers to find relationships in text. For example, The Rubber Episode in the Amazon Prime TV show This Giant Beast That Is The Global Economy shows how a fungal disease has the potential to devastate the global economy, even though at first it might appear there is no link between the two. According to the US National Library of Medicine, natural rubber accounts for 40% of the world's consumption, and the South American Leaf Blight (SALB) fungal disease has the potential to spread worldwide and severely inhibit rubber production. Airplanes can't land without rubber, and its uses are so myriad that it would have unprecedented implications on the economy. This an example of a pattern that ML and NLP models are so good at finding specific items of interest across vast text corpora.

Before AWS and cloud computing revolutionized access to advanced technologies, setting up NLP models for text analytics was challenging to say the least. The most common reasons were as follows:

  • Lack of skills: Expertise in identifying data, feature engineering, building models, training, and tuning are all tasks that require a unique combination of skills, including software engineering, mathematics, statistics, and data engineering, that only a few practitioners have.
  • Initial infrastructure setup cost: ML training is an iterative process, often requiring a trial-and-error approach to tune the models to get the desired accuracy. Further training and inference may require GPU acceleration based on the volume of data and the number of requests, requiring a high initial investment.
  • Scalability with the current on-premises environment: Running ML training and inference from on-premises servers constrains the elasticity required to scale compute and storage based on model size, data volumes, and inference throughput needed. For example, training large-scale transformer models may require massively parallel clusters, and capacity planning for such scenarios is challenging.
  • Availability of tools to help orchestrate the various moving parts of NLP training: As mentioned before, the ML workflow comprises many tasks, such as data discovery, feature engineering, algorithm selection, model building, which includes training and fine-tuning the models several times, and finally deploying those models into production. Furthermore, getting an accurate model is a highly iterative process. Each of these tasks requires purpose-built tools and expertise to achieve the level of efficiency needed for good models, which is not easy.

Not anymore. The AWS AI services for natural language capabilities enable adding speech and text intelligence to applications using API calls rather than needing to develop and train models. NLU services provide the ability to convert speech to text with Amazon Transcribe (https://aws.amazon.com/transcribe/) or text to speech with Amazon Polly (https://aws.amazon.com/polly/). For NLP requirements, Amazon Textract (https://aws.amazon.com/textract/) enables applications to read and process handwritten and printed text from images and PDF documents, and with Amazon Comprehend (https://aws.amazon.com/comprehend/), applications can quickly analyze text and find insights and relationships with no prior ML training. For example, Assent, a supply chain data management company, used Amazon Textract to read forms, tables, and free-form text, and Amazon Comprehend to derive business-specific entities and values from the text. In this book, we will be walking you through how to use these services for some popular workflows. For more details, please refer to Chapter 4, Automating Document Processing Workflows.

In this section, we saw some examples of NLP's significance in solving real-world challenges, and what exactly it means. We understood that finding patterns in data can bring new meaning to light, and NLP models are very good at deriving these patterns. We then reviewed some technology challenges in NLP implementations and saw a brief overview of the AWS AI services. In the next section, we will introduce the AWS ML stack, and provide a brief overview of each of the layers.

Introducing the AWS ML stack

The AWS ML services and features are organized into three layers of the stack, keeping in mind that some developers and data scientists are expert ML practitioners who are comfortable working with ML frameworks, algorithms, and infrastructure to build, train, and deploy models.

For these experts, the bottom layer of the AWS ML stack offers powerful CPU and GPU compute instances (the https://aws.amazon.com/ec2/instance-types/p4/ instances offer the highest performance for ML training in the cloud today), support for major ML frameworks including TensorFlow, PyTorch, and MXNet, which customers can use to build models with Amazon SageMaker as a managed experience, or using deep learning AMIs and containers on Amazon EC2 instances.

You can see the three layers of the AWS ML stack in the next figure. For more details, please refer to https://aws.amazon.com/machine-learning/infrastructure/:

To make ML more accessible and expansive, at the middle layer of the stack, Amazon SageMaker is a fully managed ML platform that removes the undifferentiated heavy lifting at each step of the ML process. Launched in 2018, SageMaker is one of the fastest-growing services in AWS history and is built on Amazon's two decades of experience in building real-world ML applications. With SageMaker Studio, developers and data scientists have the first fully integrated development environment designed specifically for ML. To learn how to build ML models using Amazon SageMaker, refer to Julien Simon's book, Learn Amazon SageMaker, also published by Packt (https://www.packtpub.com/product/learn-amazon-sagemaker/9781800208919):

Figure 1.7 – A tabular list of Amazon SageMaker features for each step of the ML workflow

Figure 1.7 – A tabular list of Amazon SageMaker features for each step of the ML workflow

For customers who are not interested in dealing with models and training, at the top layer of the stack, the AWS AI services provide pre-trained models with easy integration by means of API endpoints for common ML use cases including speech, text, vision, recommendations, and anomaly detection:

Figure 1.8 – AWS AI services

Figure 1.8 – AWS AI services

Alright, it's time that we started getting technical. Now that we understand how cloud computing played a major role in bringing ML and AI to the mainstream and how adding NLP to your application can accelerate business outcomes, let's deep dive into the NLP services Amazon Textract for document analysis and Amazon Comprehend for advanced text analytics.

Ready? Let's go!!

Summary

In this chapter, we introduced NLP by tracing the origins of AI, how it evolved over the last few decades, and how the application of AI became mainstream with the significant advances made with ML algorithms. We reviewed some examples of these algorithms, along with an example of how they can be used. We then pivoted to AI trends and saw how AI adoption grew exponentially over the last few years and has become a key technology in accelerating enterprise business value.

We read a cool example of how ExxonMobil uses Alexa at their gas stations and delved into how AI was created to mimic human cognition, and the broad categories of their applicability, such as text, speech, and vision. We saw how AI in natural language has two main areas of usage NLU for voice-based uses and NLP for deriving insights from text.

In analyzing how enterprises are building NLP models today, we reviewed some of the common challenges and how to mitigate them, such as digitizing paper-based text, collecting data from disparate sources, and understanding patterns in data, and how resource-intensive these solutions can be.

We then reviewed NLP industry trends and market segmentation and saw with an example how important NLP was and still continues to be during the pandemic. We dove deep into the philosophy of NLP and realized it was all about converting text to numerical representations and understanding the underlying patterns to decipher new meanings. We looked at an example of this pattern with how SALB could impact the global economy.

Finally, we reviewed the technology implications in setting up NLP training and the associated challenges. We reviewed the three layers of the AWS ML stack and introduced AWS AI services that provided pre-built models and ready-made intelligence.

In the next chapter, we will introduce Amazon Textract, a fully managed ML service that can read both printed and handwritten text from images and PDFs without having to train or build models and can be used without the need for ML skills. We will cover the features of Amazon Textract, what its functions are, what business challenges it was created to solve, what types of user requirements it can be applied to, and how easy it is to integrate Amazon Textract with other AWS services such as AWS Lambda for building business applications.

Further reading

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Get to grips with AWS AI services for NLP and find out how to use them to gain strategic insights
  • Run Python code to use Amazon Textract and Amazon Comprehend to accelerate business outcomes
  • Understand how you can integrate human-in-the-loop for custom NLP use cases with Amazon A2I

Description

Natural language processing (NLP) uses machine learning to extract information from unstructured data. This book will help you to move quickly from business questions to high-performance models in production. To start with, you'll understand the importance of NLP in today’s business applications and learn the features of Amazon Comprehend and Amazon Textract to build NLP models using Python and Jupyter Notebooks. The book then shows you how to integrate AI in applications for accelerating business outcomes with just a few lines of code. Throughout the book, you'll cover use cases such as smart text search, setting up compliance and controls when processing confidential documents, real-time text analytics, and much more to understand various NLP scenarios. You'll deploy and monitor scalable NLP models in production for real-time and batch requirements. As you advance, you'll explore strategies for including humans in the loop for different purposes in a document processing workflow. Moreover, you'll learn best practices for auto-scaling your NLP inference for enterprise traffic. Whether you're new to ML or an experienced practitioner, by the end of this NLP book, you'll have the confidence to use AWS AI services to build powerful NLP applications.

Who is this book for?

If you're an NLP developer or data scientist looking to get started with AWS AI services to implement various NLP scenarios quickly, this book is for you. It will show you how easy it is to integrate AI in applications with just a few lines of code. A basic understanding of machine learning (ML) concepts is necessary to understand the concepts covered. Experience with Jupyter notebooks and Python will be helpful.

What you will learn

  • Automate various NLP workflows on AWS to accelerate business outcomes
  • Use Amazon Textract for text, tables, and handwriting recognition from images and PDF files
  • Gain insights from unstructured text in the form of sentiment analysis, topic modeling, and more using Amazon Comprehend
  • Set up end-to-end document processing pipelines to understand the role of humans in the loop
  • Develop NLP-based intelligent search solutions with just a few lines of code
  • Create both real-time and batch document processing pipelines using Python

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Nov 26, 2021
Length: 508 pages
Edition : 1st
Language : English
ISBN-13 : 9781801815482
Category :
Languages :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning

Product Details

Publication date : Nov 26, 2021
Length: 508 pages
Edition : 1st
Language : English
ISBN-13 : 9781801815482
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 158.97
Natural Language Processing with AWS AI Services
$54.99
Learn Amazon SageMaker
$48.99
Machine Learning with Amazon SageMaker Cookbook
$54.99
Total $ 158.97 Stars icon

Table of Contents

22 Chapters
Section 1:Introduction to AWS AI NLP Services Chevron down icon Chevron up icon
Chapter 1: NLP in the Business Context and Introduction to AWS AI Services Chevron down icon Chevron up icon
Chapter 2: Introducing Amazon Textract Chevron down icon Chevron up icon
Chapter 3: Introducing Amazon Comprehend Chevron down icon Chevron up icon
Section 2: Using NLP to Accelerate Business Outcomes Chevron down icon Chevron up icon
Chapter 4: Automating Document Processing Workflows Chevron down icon Chevron up icon
Chapter 5: Creating NLP Search Chevron down icon Chevron up icon
Chapter 6: Using NLP to Improve Customer Service Efficiency Chevron down icon Chevron up icon
Chapter 7: Understanding the Voice of Your Customer Analytics Chevron down icon Chevron up icon
Chapter 8: Leveraging NLP to Monetize Your Media Content Chevron down icon Chevron up icon
Chapter 9: Extracting Metadata from Financial Documents Chevron down icon Chevron up icon
Chapter 10: Reducing Localization Costs with Machine Translation Chevron down icon Chevron up icon
Chapter 11: Using Chatbots for Querying Documents Chevron down icon Chevron up icon
Chapter 12: AI and NLP in Healthcare Chevron down icon Chevron up icon
Section 3: Improving NLP Models in Production Chevron down icon Chevron up icon
Chapter 13: Improving the Accuracy of Document Processing Workflows Chevron down icon Chevron up icon
Chapter 14: Auditing Named Entity Recognition Workflows Chevron down icon Chevron up icon
Chapter 15: Classifying Documents and Setting up Human in the Loop for Active Learning Chevron down icon Chevron up icon
Chapter 16: Improving the Accuracy of PDF Batch Processing Chevron down icon Chevron up icon
Chapter 17: Visualizing Insights from Handwritten Content Chevron down icon Chevron up icon
Chapter 18: Building Secure, Reliable, and Efficient NLP Solutions Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Full star icon 5
(21 Ratings)
5 star 95.2%
4 star 4.8%
3 star 0%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Hitesh Hinduja Sep 29, 2022
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is one of the books where you would want to read more and more. The author has explained concepts in a simplified way with significant amount of practical demos. It also gives you a strong business sense before starting any chapter and then deep dives into how NLP techniques can be used in AWS to solve the problem. Must read book for all
Amazon Verified review Amazon
JANGWON KIM Dec 05, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
What I like the most about this book is three: easy read to follow, covering how to build end to end solutions, and good explanation about AWS AI services.I'm working on healthcare AI domain, which is a part of what this book help for.
Amazon Verified review Amazon
Wrick T Dec 02, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
A very practical approach to natural language processing and solving business problems.The authors took a very practical, engaging and intuitive approach with examples, tips and well-structured content. Examples clearly outline solutions to practical business problem using AWS services, which is very helpful for the readers. Highly recommended read if you are starting your journey implementing natural language processing in the AWS Cloud.
Amazon Verified review Amazon
Amit Lodh Dec 29, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is targeted for NLP practitioners looking to use AWS AI services to create solutions for business outcomes. The authors, experienced practitioners themselves, do an outstanding job in taking the reader through the details of AWS AI services for building NLP solutions. The readers will get a lot of value from the details from the AWS console, visual examples and the varied perspectives of building NLP solutions. I especially appreciated the examples of NLP addressing different business needs such as enterprise search solutions, improving customer service efficiency, automating document processing, monetizing media content and data extraction for financial services and healthcare. Authors also cover the production aspects of NLP solutions, such as security, reliability and efficiency. Utilizing the untapped immense value from the unstructured data is going to be critical for realizing value of digital transformation in the area of business model transformation. NLP is going to be key for that. If you are a technical practitioner, you will want to read this book to add value to the business outcome. If you are a business or technical leader, leading the charge of digital transformation in your organization, you should read the book to understand the value NLP and extraction of information from unstructured document in accelerating your business outcome.
Amazon Verified review Amazon
Viral Shah Dec 21, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Anyone who is new to AI/ML space, I would highly recommend to read this book. Both the authors have done fantastic job to explain the concepts in a lucid manner. Book is full of relevant examples which makes every chapter interesting. This book really helps to get up to speed on this newer concepts, and you won't regret reading this book !
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.