Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Natural Language Processing with AWS AI Services

You're reading from   Natural Language Processing with AWS AI Services Derive strategic insights from unstructured data with Amazon Textract and Amazon Comprehend

Arrow left icon
Product type Paperback
Published in Nov 2021
Publisher Packt
ISBN-13 9781801812535
Length 508 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (2):
Arrow left icon
Mona M Mona M
Author Profile Icon Mona M
Mona M
Premkumar Rangarajan Premkumar Rangarajan
Author Profile Icon Premkumar Rangarajan
Premkumar Rangarajan
Arrow right icon
View More author details
Toc

Table of Contents (23) Chapters Close

Preface 1. Section 1:Introduction to AWS AI NLP Services
2. Chapter 1: NLP in the Business Context and Introduction to AWS AI Services FREE CHAPTER 3. Chapter 2: Introducing Amazon Textract 4. Chapter 3: Introducing Amazon Comprehend 5. Section 2: Using NLP to Accelerate Business Outcomes
6. Chapter 4: Automating Document Processing Workflows 7. Chapter 5: Creating NLP Search 8. Chapter 6: Using NLP to Improve Customer Service Efficiency 9. Chapter 7: Understanding the Voice of Your Customer Analytics 10. Chapter 8: Leveraging NLP to Monetize Your Media Content 11. Chapter 9: Extracting Metadata from Financial Documents 12. Chapter 10: Reducing Localization Costs with Machine Translation 13. Chapter 11: Using Chatbots for Querying Documents 14. Chapter 12: AI and NLP in Healthcare 15. Section 3: Improving NLP Models in Production
16. Chapter 13: Improving the Accuracy of Document Processing Workflows 17. Chapter 14: Auditing Named Entity Recognition Workflows 18. Chapter 15: Classifying Documents and Setting up Human in the Loop for Active Learning 19. Chapter 16: Improving the Accuracy of PDF Batch Processing 20. Chapter 17: Visualizing Insights from Handwritten Content 21. Chapter 18: Building Secure, Reliable, and Efficient NLP Solutions 22. Other Books You May Enjoy

Overcoming the challenges in building NLP solutions

We read earlier that the main difference between the algorithms used for regular programming and those used for ML is the ability of ML algorithms to modify their processing based on the input data fed to them. In the NLP context, as in other areas of ML, these differences add significant value and accelerate enterprise business outcomes. Consider, for example, a book publishing organization that needs to create an intelligent search capability displaying book recommendations to users based on topics of interest they enter.

In a traditional world, you would need multiple teams to go through the entire book collection, read books individually, identify keywords, phrases, topics, and other relevant information, create an index to associate book titles, authors, and genres to these keywords, and link this with the search capability. This is a massive effort that takes months or years to set up based on the size of the collection, the number of people, and their skill levels, and the accuracy of the index is prone to human error. As books are updated to newer editions, and new books are added or removed, this effort would have to be repeated incrementally. This is also a significant cost and time investment that may deter many unless that time and those resources have already been budgeted for.

To bring in a semblance of automation in our previous example, we need the ability to digitize text from documents. However, this is not the only requirement, as we are interested in deriving context-based insights from the books to power a recommendations index for a reader. And if we are talking about, for example, a publishing house such as Packt, with 7,500+ books in its collection, we need a solution that not only scales to process large numbers of pages, but also understands relationships in text, and provides interpretations based on semantics, grammar, word tokenization, and language to create smart indexes. We will cover a detailed walkthrough of this solution, along with code samples and demo videos, in Chapter 5, Creating NLP Search.

Today's enterprises are grappling with leveraging meaningful insights from their data primarily due to the pace at which it is growing. Until a decade or so, most organizations used relational databases for all their data management needs, and some still do even today. This was fine because the data volume need was in single-digit terabytes or less. In the last few years, the technology landscape has witnessed a significant upheaval with smartphones becoming ubiquitous, the large-scale proliferation of connected devices (in the billions), the ability to dynamically scale infrastructure in size and into new geographies, and storage and compute costs becoming cheaper due to the democratization offered by the cloud. All of this means applications get used more often, have much larger user bases, more processing power, and capabilities, can accelerate their pace of innovation with faster go-to-market cycles, and as a result, have a need to store and manage petabytes of data. This, coupled with application users demanding faster response times and higher throughput, has put a strain on the performance of relational databases, fueling a move toward purpose-built databases such as Amazon DynamoDB, a key-value and document database that delivers single-digit millisecond latency at any scale.

While this move signals a positive trend, what is more interesting is how enterprises utilize this data to gain strategic insights. After all, data is only as useful as the information we can glean from it. We see many organizations, while accepting the benefits of purpose-built tools, implementing these changes in silos. So, there are varying levels of maturity in properly harnessing the advantages of data. Some departments use an S3 data lake (https://aws.amazon.com/products/storage/data-lake-storage/) to source data from disparate sources and run ML to derive context-based insights, others are consolidating their data in purpose-built databases, while the rest are still using relational databases for all their needs.

You can see a basic explanation of the main components of a data lake in the following Figure 1.5, An example of an Amazon S3 data lake:

Figure 1.4 – An example of an Amazon S3 data lake

Figure 1.4 – An example of an Amazon S3 data lake

Let's see how NLP can continue to add business value in this situation by referring back to our book publishing example. Suppose we successfully built our smart indexing solution, and now we need to update it with book reviews received via Twitter feeds. The searchable index should provide book recommendations based on review sentiment (for example, don't recommend a book if reviews are negative > 50% in the last 3 months). Traditionally, business insights are generated by running a suite of reports on behemoth data warehouses that collect, mine, and organize data into marts and dimensions. A tweet may not even be under consideration as a data source. These days, things have changed and mining social media data is an important aspect of generating insights. Setting up business rules to examine every tweet is a time-consuming and compute-intensive task. Furthermore, since a tweet is unstructured text, a slight change in semantics may impact the effectiveness of the solution.

Now, if you consider model training, the infrastructure required to build accurate NLP models typically uses the deep learning architecture called Transformers (please see https://www.packtpub.com/product/transformers-for-natural-language-processing/9781800565791) that use sequence-to-sequence processing without needing to process the tokens in order, resulting in a higher degree of parallelization. Transformer model families use billions of parameters with the training architecture using clusters of instances for distributed learning, which adds to time and costs.

AWS offers AI services that allow you, with just a few lines of code, to add NLP to your applications for the sentiment analysis of unstructured text at an almost limitless scale and immediately take advantage of the immense potential waiting to be discovered in unstructured text. We will cover AWS AI services in more detail from Chapter 2, Introducing Amazon Textract, onward.

In this section, we reviewed some challenges organizations encounter when building NLP solutions, such as complexities in digitizing paper-based text, understanding patterns from structured and unstructured data, and how resource-intensive these solutions can be. Let's now understand why NLP is an important mainstream technology for enterprises today.

You have been reading a chapter from
Natural Language Processing with AWS AI Services
Published in: Nov 2021
Publisher: Packt
ISBN-13: 9781801812535
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image