Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Natural Language Processing and Computational Linguistics
Natural Language Processing and Computational Linguistics

Natural Language Processing and Computational Linguistics: A practical guide to text analysis with Python, Gensim, spaCy, and Keras

Arrow left icon
Profile Icon Bhargav Srinivasa-Desikan
Arrow right icon
€18.99 per month
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.6 (7 Ratings)
Paperback Jun 2018 306 pages 1st Edition
eBook
€8.99 €26.99
Paperback
€32.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon Bhargav Srinivasa-Desikan
Arrow right icon
€18.99 per month
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.6 (7 Ratings)
Paperback Jun 2018 306 pages 1st Edition
eBook
€8.99 €26.99
Paperback
€32.99
Subscription
Free Trial
Renews at €18.99p/m
eBook
€8.99 €26.99
Paperback
€32.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Natural Language Processing and Computational Linguistics

What is Text Analysis?

There is no time like now to do text analysis – we have an abundance of easily available data, powerful and free open source tools to conduct our analysis, and research on machine learning, computational linguistics and computing with text is progressing at a pace we have not seen before.

In this chapter, we will go into details about what exactly text analysis is and look at the motivations for studying and understanding text analysis. Following are the topics we will cover in this chapter:

  • What is text analysis?
  • Where's the data at?
  • Garbage in, garbage out
  • Why should YOU be interested?
  • References

A note about the references: they will appear throughout the PDF version of the book as links, and if it is an academic reference it will link to the PDF of the reference or the journal page. All of these links and references are then displayed as the final section of the chapter, so offline readers can also visit the websites or research papers.

What is text analysis?

If there's one medium of media which we are exposed to every single day, it's text. Whether it's our morning paper or the messages we receive, it's likely you receive your information in the form of text.

Let's put things into a little more perspective consider the amount of text data handled by companies such as Google (1+ trillion queries per year), Twitter (1.6 billion queries per day), and WhatsApp (30+ billion messages per day). That's an incredible resource, and the sheer ubiquitous nature of the text is enough reason for us to take it seriously. Textual data also has huge business value, and companies can use this data to help profile customers and understand customer trends. This can either be used to offer a more personalized experience for users or as information for targeted marketing. Facebook, for example, uses textual data heavily, and one of the algorithms we will learn later in this book was developed at Facebook's AI research team.

Fig 1.1 Rate of data growth from 2006 2018 with predicted rates of data in 2019 and 2020. Source: Patrick Cheeseman, https://www.eetimes.com/author.asp?section_id=36&doc_id=1330462

Text analysis can be understood as the technique of gleaning useful information from text. This can be done through various techniques, and we use Natural Language Processing (NLP), Computational Linguistics (CL), and numerical tools to get this information. These numerical tools are machine learning algorithms or information retrieval algorithms. We'll briefly, informally explain these terms as they will be coming up throughout the book.

Natural language processing (NLP) refers to the use of a computer to process natural language. For example, removing all occurrences of the word thereby from a body of text is one such example, albeit a basic example.

Computational linguistics (CL), as the name suggests, is the study of linguistics from a computational perspective. This means using computers and algorithms to perform linguistics tasks such as marking your text as a part of speech (such as noun or verb), instead of performing this task manually.

Machine Learning (ML) is the field of study where we use statistical algorithms to teach machines to perform a particular task. This learning occurs with data, and our task is often to predict a new value based on previously observed data.

Information Retrieval (IR) is the task of looking up or retrieving information based on a query by the user. The algorithms that aid in performing this task are called information retrieval algorithms, and we will be encountering them throughout the book.

Text analysis itself has been around for a long time one of the first definitions of Business Intelligence (BI) itself, in an October 1958 IBM Journal article by H. P. Luhn, A Business Intelligence System [1], describes a system that will do the following:

"...utilize data-processing machines for auto-abstracting and auto-encoding of documents and for creating interest profiles for each of the 'action points' in an organization. Both incoming and internally generated documents are automatically abstracted, characterized by a word pattern, and sent automatically to appropriate action points."

It's interesting to see talk about documents, instead of numbers to think that the first ideas of business intelligence were understanding text and documents is again a testament to text analysis throughout the ages. But even outside the realm of text analysis for business, using computers to better understand text and language has been around since the beginning of ideas of artificial intelligence. The 1999 review on text analysis by John Hutchins, Retrospect and prospect in computer-based translation [2], talks about efforts to do machine translation as early as the 1950s by the United States military, in order to translate Russian scientific journals into English.

Efforts to make an intelligent machine started with text as well the ELIZA program developed in 1966 at MIT by Joseph Weizenbaum is one example. Even though the program had no real understanding of language, by basic pattern matching it could attempt to hold a conversation. These are just some of the earliest attempts to analyze text computers (and human beings!) have come a long way since, and we now have incredible tools at our disposal.

Machine translation itself has come a long way, and we can now use our smartphones to effectively translate between languages, and with cutting-edge techniques such as Google's Neural Machine Translation, the gap between academia and industry is reducing allowing us to actually experience the magic of natural language processing first hand.

Fig 1.2 An example of a Neural Translation model, working on French to English

Advances in this subject have helped advance the way we approach speech as well closed captioning in videos, and personal assistants such as Apple's Siri or Amazon's Alexa are greatly benefited by superior text processing. Understanding structure in conversations and extracting information were key problems in early NLP, and the fruits of the research done are being very apparent in the 21st century.

Search engines such as Google or Bing! also stand on the shoulders of the research done in NLP and CL and affect our lives in an unprecedented way. Information retrieval (IR) builds on statistical approaches in text processing and allows us to classify, cluster, and retrieve documents. Methods such as topic modeling can help us identify key topics in large, unstructured bodies of text. Identifying these topics goes beyond searching for keywords, and we use statistical models to further understand the underlying nature of bodies of text. Without the power of computers, we could not perform this kind of large-scale statistical analysis on the text. We will be exploring topic modeling in detail later on in the book.

Fig 1.3 Techniques such as topic modeling use probabilistic modeling methods to identify key topics from the text. We will be studying this in detail later in the book

Going one step ahead of just being able to experience the wonders of modern computing on our mobile phones, recent developments in both Python and NLP means that we can now develop such systems on our own!

Not only has there been an evolution in the techniques used in NLP and text analysis, it has become very accessible to us open source packages are becoming state-of-the-art, performing as well as commercial tools. An example of a commercial tool would be Microsoft's Text Analysis API (https://azure.microsoft.com/en-us/services/cognitive-services/text-analytics/).

MATLAB is another example of a popular commercial tool used for scientific computing. While historically such commercial tools performed better than free, open source software, an increase in people contributing to open source libraries, as well as funding from industry has helped the open source community immensely. Now, the tables appear to have turned and many software giants use open source packages for their internal systems such as Google using TensorFlow and Apple using scikit-learn! TensorFlow and scikit-learn are two open source Python machine learning packages.

It can be argued that the sheer number of packages offered by the Python ecosystem means it leads the pack when it comes to doing text analysis, and we will focus our efforts here. A very strong and active open source community adds to the appeal.

Throughout the course of the book, we will discuss modern natural language processing and computational linguistics techniques and the best open source tools available to us which we can use to apply these techniques.

Where's the data at?

While it is important to be aware of the techniques and the tools involved in NLP and CL, it is, of course, pointless without any data. Luckily for us, we have access to an abundance of data if we look in the right places. The easiest way to find textual data to work on is to look for a corpus.

A text corpus is a large and structured set of texts and is a great way to start off with text analysis. Examples of such corpora that are free are the Open American National Corpus [5] or the British National Corpus [6]. Wikipedia has a useful list of the largest corpuses available in its article on text corpuses [7]. These are not limited to the English language, and there also exist various corpuses in European and Asian languages, and there are constant efforts worldwide to create corpuses for majority of languages. Universities research labs are another valuable source for obtaining corpuses indeed, one of the most iconic English language corpuses, the Brown Corpus, was put together at Brown University.

Different corpuses tend to have varying levels of information present, usually dependent on the primary purpose for that corpora for example, corpora whose primary function is to aid during translation would have the same sentence present in multiple languages. Another way corpora have extra information is through annotation. Examples of annotation in text usually include Part-Of-Speech (POS) tagging or Named-Entity-Recognition (NER). POS-tagging refers to marking each word in a sentence with its part of speech (Noun, verb, adverb, and so on), and a corpus annotated for NER would have all named entities recognized, such as places, people, and times. We'll be further going into details of both POS-tagging and NER later on in the book, in Chapter 5, POS-Tagging and its Applications and Chapter 6, NER-Tagging and its Applications.

Based on the structure and varying levels of information present in the corpora, it would have a different purpose. Some corpora are also built to evaluate clustering or classification tasks, where rather than annotation being important, the label or class would be. This means that some corpora are designed to aid with machine learning tasks such as cluster or classification by providing text with labels tagged by humans. Clustering refers to the task of grouping similar objects together, and classification is the process of deciding which predefined class an identifying what exactly your dataset is going to be used for is a crucial part of text analysis and an important first step.

Apart from downloading datasets or scraping data off the internet, there are still some rich sources for gathering our textual data in particular, literature. One example of this is the research done at the University of Pennsylvania, where Alejandro Ribeiro, Santiago Segarra, Mark Eisen, and Gabriel Egan discovered possible collaborators of Shakespeare, a literary history problem that stumbled many researchers [14]. They approached the problem by identifying literary styles an upcoming field of study in computational linguistics called style analysis.

The increased use of computational tools to perform research in the humanities has also led to the growth of Digital Humanities labs in universities, where traditional research approaches are either aided or overtaken by computer science, and in particular machine learning (and by extension), natural language processing. Speeches of politicians, or proceedings in parliament, for example, are another example of a data source used often in this community. TheyWorkForYou [17] is A UK parliament tracking system, which gets speeches and uploads them and is an example of the many sites available doing this kind of work.

Project Gutenberg is likely the best resource to download books and contains over 50,000 free eBooks and many literary classics. Personal PDFs and eBooks also remain a resource, but again, it is important to know the legal nature of your text before analyzing it. Downloading a pirated copy of, say, Harry Potter off the internet and publishing text analysis results might not be the best idea if you cannot explain where you got the text from! Similarly, text analysis on private text messages might not only annoy your friends but also could be infringing on privacy laws.

Fig 1.4 An example of a text dataset list here, it is of reviews datasets found on

So where else apart from downloading a structured data-set straight off the internet, do we get our textual data? Well, the internet, of course. Even if it isn't labelled, the sheer amount of text on the internet means that we can access large parts of it the [7] is one such example, and the media dump of all the content on Wikipedia, after unzipping, is about 58 GB (as of April 2018) more than enough text to play around with. The popular news aggregation website reddit.com [9] allows for easy web-scraping and is another great resource for text analysis.

Python again remains a great choice to use for any such web-scraping, and libraries such as BeautifulSoup [10], urllib [11] and scrapy [12] are designed particularly for this. It is important to remain careful about the legal side of things here, and make sure to check the terms and conditions of the website where you are scraping the data from a number of websites will not allow you to use the information on the website for commercial purposes.

Twitter is another website that is fast becoming a very important part of text analysis you even have academia taking this resource very seriously (What is Twitter, a social network or a news media? [13] has over 5000 citations!), with multiple papers being written on text analysis of tweets, and even full-fledged tools [15] to do sentiment analysis have been built! The Twitter-streaming API allows us to easily mine for textual data from Twitter as well, and the Python interface [16] is straightforward. Most world leaders are users of Twitter, as well as celebrities and major news corporations there is a lot of interesting insights Twitter can offer us.

Fig 1.5 An example of the rich text resource Twitter has become, with multiple structured datasets available [7]. These datasets, all mined from Twitter, have particular tasks, which can be used for and fall under the category of labeled datasets which we discussed before.

Other examples of textual information you can get off the internet include research articles, medical reports, restaurant reviews (the Yelp! dataset comes to mind), and other social media websites. Sentiment analysis is usually the prime objective in these cases. As the name suggests, sentiment analysis refers to the task of identifying sentiment in text. These sentiments can be basic, such as positive or negative sentiment, but we could have more complex sentiment analysis tasks where we analyze whether a sentence contains happy, sad, or angry sentiments.

It's clear that if we look hard enough, it's more than easy to find data to play around with. But let's take a small step back from downloading data off the internet where else can we try and find information?

Right in our hands, as it may seem we send and receive text messages and emails every day, and we can use this text for text analysis. Most text messaging applications have interfaces to download chats. WhatsApp, for example, will mail the data to you [18], with both media and text. Most mail clients have the same option, and the advantage in both these cases is that this kind of data is often well organized, allowing for easy cleaning and preprocessing before we dive into the data.

One aspect we've ignored so far whilst talking about data is the noise which is often in the text in tweets, for example, short forms and emoticons which are often used, and in some cases, we have multi-lingual data where a simple analysis might fail. This brings us to arguably the most important aspect of text analysis preprocessing.

Garbage in, garbage out

Garbage in, garbage out (or GIGO) is an adage of computer science which is even more important when dealing with machine learning and possibly even more so when dealing with textual data. Garbage in, garbage out means that if we have poorly formatted data, it is likely we will have poor results.

Fig 1.6 XKCD hits the hammer on the nail once again (https://xkcd.com/1838/)

While more data usually leads to a better prediction, it isn't always the same case with text analysis, where more data can result in nonsense results or results which we don't always want. An intuitive example: the part of speech, articles, such as the words a, or the tend to appear a lot in text, but not adding any information to the text, and is usually limited to grammar or structure.

Words such as these which don't provide useful information are called stop words, and these words are often removed from the text before applying text analysis techniques on them. Similarly, sometimes we remove words with very high frequency in the body of text, and words which only appear once or twice it is highly likely these words will not be useful to our analysis. That being said, this depends heavily on the kind of task being performed - if, for example, we would want to replicate human writing styles, stop words are important because humans many such words when writing. An example of how stop words can also include useful information is in this article, Pastiche detection based on stopword rankings. Exposing impersonators of a Romanian writer [20], is a study identified a certain author using frequency of stop words.

Let's consider another example where we might be dealing with useless data if searching for influential words or topics in the text, would it make sense to have both the words reading and read in the results? Here, shortening the word reading to read would not lead to any loss of information. But on a similar note, it would make sense to have the words information and inform exist separately in the same body of text, because they could mean different things based on the context. We would then need techniques to shorten words appropriately. Lemmatizing and stemming are two methods we use to tackle this problem and remain two of the core concepts in natural language processing. We will be exploring these two techniques in more detail in Chapter 3, spaCy's Language models.

Even after basic text-processing, our data is still a collection of words. Since machines do not inherently understand the concepts tied to words, we can instead use numbers that represent individual words. The next important step in text analysis is converting words into numbers, whether it is bag-of-words (BOW), or term frequency-inverse document frequency (TF-IDF), which are different ways to count the number of words in each document or sentence. There are also more advanced techniques to represent words such as Word2Vec and GloVe.

We will go into these details and techniques in more detail in the chapter on preprocessing techniques it is especially important to understand the motivation behind these techniques, and that a computer's output is only as good as the input you feed it.

Why should you do text analysis?

We've talked about what text analysis is, where we can find the data, and some of the things to keep in mind before diving into text analysis. But after all, what motivation do you, the reader, have to actually go about doing text analysis?

For starters, it's the sheer abundance of easily available data that we can use. In the big data age, there really is no excuse to not have a look at what all our data really means. In fact, apart from the massive data sets, we can download off the internet, we also have access to small data text messages, emails, a collection of poems are such examples. You could even do a meta-analysis and run an analysis on this very book! Textual data is even easier to get a hold-off, but far more importantly - it's easy to interpret and understand the results of the analysis. Numbers might not always make sense and are not always appealing to look at - but words are easier for us human beings to appreciate.

Text analysis remains exciting also because we can use data which directly involves the user- our own text conversations, our favorite childhood book, or tweets by our favorite celebrity. The personal nature of text data always adds an extra bit of motivation, and it also likely means we are aware of the nature of the data, and what kind of results to expect.

NLP techniques can also help us construct tools that can assist personal businesses or enterprises chatbots, for example, are becoming increasingly common in major websites, and with the right approach, it is possible to have a personal chat-bot. This is largely due to a subfield of machine learning, called Deep Learning, where we use algorithms and structures that are inspired by the structure of the human brain. These algorithms and structures are also referred to as neural networks. Advances in deep learning have introduced to powerful neural networks such as Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs). Now, even with minimal knowledge of the mathematical functioning of these algorithms, high-level APIs are allowing us to use these tools. Integrating this into our daily life is no longer reserved for computer science researchers or full-time engineers with the right collection of data and open source packages, this is well within our capabilities.

Open source packages have become industry standard Google has released and maintains TensorFlow [21], and packages such as scikit-learn [22] are used by Apple and Spotify, and spaCy [23], which we will extensively discuss throughout this book is used by Quora, a popular question-answer website.

We are no longer limited by either data or the tools the only two things we would need to do text analysis.

The programming language Python will be our friend throughout the book, and all the tools we will use will all be free open-source software. While we move towards open science, we also move towards open source code, and this will remain a key philosophy throughout the book. In the world of research, open source code means academic results are reproducible and available to all those interested. Python remains an easy-to-use and powerful language and serves as a great way to enter the world of natural language processing.

One could argue that the last thing needed was the knowledge of how to apply these tools and to wrangle with the data but that is precisely the purpose of the book and, hoping to let the reader build their own natural language processing pipelines and models at the end of the journey.

Summary

We've had a look at the incredible power of text analysis, and the kind of things we can do with it as well as the kind of tools we would be using to take advantage of this. Data has become increasingly easy for us to access, and with the growth of social media, we have continuous access to both new data, as well as standardized annotated datasets.

This book will aim at walking the reader through the tools and knowledge required to conduct textual analysis on their own personal data or own standardized datasets. We will discuss methods to access and clean data to make it ready for preprocessing, as well as how to explore and organize our textual data. Classification and clustering are two other commonly conducted text processing tasks, and we will figure out how to perform this as well, before finishing up with how to use deep learning for text.

In the next chapter, we will introduce how and why Python is the right choice for our purposes, as well as discuss some Python tricks and tips to help us with text analysis.

References

[1] A business intelligence system H. P. Lunn, October 1958 (https://dl.acm.org/citation.cfm?id=1662381)

[2] Retrospect and prospect in computer-based translation John Hutchins, September 1999 (http://www.mt-archive.info/90/MTS-1999-Hutchins.pdf)

[3] Introduction to Neural Machine Translation with GPUs:
https://devblogs.nvidia.com/parallelforall/introduction-neural-machine-translation-gpus-part-2/

[4] Text Mining :
https://en.wikipedia.org/wiki/Text_mining

[5] Open American National Corpus:
http://www.anc.org

[6] British National Corpus:
http://www.natcorp.ox.ac.uk

[7] List of Text Corpora:
https://en.wikipedia.org/wiki/List_of_text_corpora

[8] Wikipedia Dataset:
https://en.wikipedia.org/wiki/Wikipedia:Database_download

[9] Reddit, news aggregation website:
https://www.reddit.com

[10] Beautiful Soup:
https://www.crummy.com/software/BeautifulSoup/bs4/doc/

[11] UrlLib:
https://docs.python.org/2/library/urllib.html

[12] Scrapy:
https://scrapy.org

[13] What is Twitter, a social network or a news media?:
https://dl.acm.org/citation.cfm?id=1772751

[14] Shakespeare and his co-authors:
https://www.upenn.edu/spotlights/shakespeare-and-his-co-authors-told-penn-engineers

[15] Tweet Sentiment Visualization:
https://www.csc2.ncsu.edu/faculty/healey/tweet_viz/tweet_app/

[16] Tweepy, twitter API:
http://www.tweepy.org

[17] TheyWorkForYou:
https://www.theyworkforyou.com

[18] Mailing WhatsApp chat history:
https://faq.whatsapp.com/en/android/23756533/

[19] Project Gutenburg:
https://www.gutenberg.org

[20] Pastiche detection based on stopword rankings. Exposing impersonators of a Romanian writer:
http://www.aclweb.org/anthology/W12-0411

[21] TensorFlow:
https://www.tensorflow.org

[22] Scikit-learn:
http://scikit-learn.org/stable/

[23] spaCy: https://spacy.io

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • • Discover the open source Python text analysis ecosystem, using spaCy, Gensim, scikit-learn, and Keras
  • • Hands-on text analysis with Python, featuring natural language processing and computational linguistics algorithms
  • • Learn deep learning techniques for text analysis

Description

Modern text analysis is now very accessible using Python and open source tools, so discover how you can now perform modern text analysis in this era of textual data. This book shows you how to use natural language processing, and computational linguistics algorithms, to make inferences and gain insights about data you have. These algorithms are based on statistical machine learning and artificial intelligence techniques. The tools to work with these algorithms are available to you right now - with Python, and tools like Gensim and spaCy. You'll start by learning about data cleaning, and then how to perform computational linguistics from first concepts. You're then ready to explore the more sophisticated areas of statistical NLP and deep learning using Python, with realistic language and text samples. You'll learn to tag, parse, and model text using the best tools. You'll gain hands-on knowledge of the best frameworks to use, and you'll know when to choose a tool like Gensim for topic models, and when to work with Keras for deep learning. This book balances theory and practical hands-on examples, so you can learn about and conduct your own natural language processing projects and computational linguistics. You'll discover the rich ecosystem of Python tools you have available to conduct NLP - and enter the interesting world of modern text analysis.

Who is this book for?

This book is for you if you want to dive in, hands-first, into the interesting world of text analysis and NLP, and you're ready to work with the rich Python ecosystem of tools and datasets waiting for you!

What you will learn

  • • Why text analysis is important in our modern age
  • • Understand NLP terminology and get to know the Python tools and datasets
  • • Learn how to pre-process and clean textual data
  • • Convert textual data into vector space representations
  • • Using spaCy to process text
  • • Train your own NLP models for computational linguistics
  • • Use statistical learning and Topic Modeling algorithms for text, using Gensim and scikit-learn
  • • Employ deep learning techniques for text analysis using Keras

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jun 29, 2018
Length: 306 pages
Edition : 1st
Language : English
ISBN-13 : 9781788838535
Category :
Languages :
Tools :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Jun 29, 2018
Length: 306 pages
Edition : 1st
Language : English
ISBN-13 : 9781788838535
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 98.97
Hands-On Natural Language Processing with Python
€32.99
Natural Language Processing with TensorFlow
€32.99
Natural Language Processing and Computational Linguistics
€32.99
Total 98.97 Stars icon
Banner background image

Table of Contents

16 Chapters
What is Text Analysis? Chevron down icon Chevron up icon
Python Tips for Text Analysis Chevron down icon Chevron up icon
spaCy's Language Models Chevron down icon Chevron up icon
Gensim – Vectorizing Text and Transformations and n-grams Chevron down icon Chevron up icon
POS-Tagging and Its Applications Chevron down icon Chevron up icon
NER-Tagging and Its Applications Chevron down icon Chevron up icon
Dependency Parsing Chevron down icon Chevron up icon
Topic Models Chevron down icon Chevron up icon
Advanced Topic Modeling Chevron down icon Chevron up icon
Clustering and Classifying Text Chevron down icon Chevron up icon
Similarity Queries and Summarization Chevron down icon Chevron up icon
Word2Vec, Doc2Vec, and Gensim Chevron down icon Chevron up icon
Deep Learning for Text Chevron down icon Chevron up icon
Keras and spaCy for Deep Learning Chevron down icon Chevron up icon
Sentiment Analysis and ChatBots Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.6
(7 Ratings)
5 star 42.9%
4 star 14.3%
3 star 14.3%
2 star 14.3%
1 star 14.3%
Filter icon Filter
Top Reviews

Filter reviews by




David Baron Aug 28, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
15 chapters of fast-paced learning - the author gives you the bootstrapped resources for success with NLP. Every concept is explained clearly and with powerful examples and analogies. Increasingly complex concepts are stacked upon each other to produce a comprehensive overview of everything in NLP, from linguistics-based methods to bag-of-words approaches, from k-means clustering and LDA models to the inevitable methods of deep learning. Highly recommended.
Amazon Verified review Amazon
TM Raghavan Aug 17, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
What I liked about this book is its style of delivery, an easy read one. As it states in the beginning, though fluency in Python or NLP concepts is better to make full use of the ideas, it still contains lots of code examples and analysis of their results, like any good book on this genre. The breadth and and the depth from the coverage of the book's focus are sufficient in my opinion.What I find missing in the book is a section on Glossary, encompassing statistics, vector /matrix algebra, NLP/IR, ANN (CNN/RNN), that are relevant to the readers' requirements. Perhaps this can be given in the beginning itself (like a Forward).Overall I find the book very useful for the students of final year Under-Graduate and Graduate programs besides researchers whose foci are outside of basic NLP but would want to analyse large texts for their research.
Amazon Verified review Amazon
Juan Oct 08, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Great book, up to date, engaging and covers a lot of topics clearly.
Amazon Verified review Amazon
Victor Aug 27, 2018
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
Although the introduction states that is necessary be fluent in Python, this book is feasible for anyone who has just a basic understanding of Python or any other programming language. The book describes profoundly all the steps of a NLP analysis, setting comprehensible examples that allows the reader to understand every aspect of NLP. I would consider this book as an advanced and complete introduction, recommended to anyone who wants to begin and follow further on the topic but getting a solid base from this book. I would recommend it.
Amazon Verified review Amazon
Itai Aug 14, 2018
Full star icon Full star icon Full star icon Empty star icon Empty star icon 3
Writing is a profession. Even excellent engineers may be bad writers, this is exactly what happens here.Super long sentences on trivial topics, and then just a taste from the really important stuff.For example: having several full pages on 1st grade stuff as "lower(), upper(), isNumeric()", having several pages on word2Vec King-man+woman = queen, and then have only a few lines about Doc2Vec which is the most important thing.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.