Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Natural Language Processing and Computational Linguistics
Natural Language Processing and Computational Linguistics

Natural Language Processing and Computational Linguistics: A practical guide to text analysis with Python, Gensim, spaCy, and Keras

Arrow left icon
Profile Icon Bhargav Srinivasa-Desikan
Arrow right icon
€32.99
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.6 (7 Ratings)
Paperback Jun 2018 306 pages 1st Edition
eBook
€8.99 €26.99
Paperback
€32.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon Bhargav Srinivasa-Desikan
Arrow right icon
€32.99
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.6 (7 Ratings)
Paperback Jun 2018 306 pages 1st Edition
eBook
€8.99 €26.99
Paperback
€32.99
Subscription
Free Trial
Renews at €18.99p/m
eBook
€8.99 €26.99
Paperback
€32.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

Natural Language Processing and Computational Linguistics

What is Text Analysis?

There is no time like now to do text analysis – we have an abundance of easily available data, powerful and free open source tools to conduct our analysis, and research on machine learning, computational linguistics and computing with text is progressing at a pace we have not seen before.

In this chapter, we will go into details about what exactly text analysis is and look at the motivations for studying and understanding text analysis. Following are the topics we will cover in this chapter:

  • What is text analysis?
  • Where's the data at?
  • Garbage in, garbage out
  • Why should YOU be interested?
  • References

A note about the references: they will appear throughout the PDF version of the book as links, and if it is an academic reference it will link to the PDF of the reference or the journal page. All of these links and references are then displayed as the final section of the chapter, so offline readers can also visit the websites or research papers.

What is text analysis?

If there's one medium of media which we are exposed to every single day, it's text. Whether it's our morning paper or the messages we receive, it's likely you receive your information in the form of text.

Let's put things into a little more perspective consider the amount of text data handled by companies such as Google (1+ trillion queries per year), Twitter (1.6 billion queries per day), and WhatsApp (30+ billion messages per day). That's an incredible resource, and the sheer ubiquitous nature of the text is enough reason for us to take it seriously. Textual data also has huge business value, and companies can use this data to help profile customers and understand customer trends. This can either be used to offer a more personalized experience for users or as information for targeted marketing. Facebook, for example, uses textual data heavily, and one of the algorithms we will learn later in this book was developed at Facebook's AI research team.

Fig 1.1 Rate of data growth from 2006 2018 with predicted rates of data in 2019 and 2020. Source: Patrick Cheeseman, https://www.eetimes.com/author.asp?section_id=36&doc_id=1330462

Text analysis can be understood as the technique of gleaning useful information from text. This can be done through various techniques, and we use Natural Language Processing (NLP), Computational Linguistics (CL), and numerical tools to get this information. These numerical tools are machine learning algorithms or information retrieval algorithms. We'll briefly, informally explain these terms as they will be coming up throughout the book.

Natural language processing (NLP) refers to the use of a computer to process natural language. For example, removing all occurrences of the word thereby from a body of text is one such example, albeit a basic example.

Computational linguistics (CL), as the name suggests, is the study of linguistics from a computational perspective. This means using computers and algorithms to perform linguistics tasks such as marking your text as a part of speech (such as noun or verb), instead of performing this task manually.

Machine Learning (ML) is the field of study where we use statistical algorithms to teach machines to perform a particular task. This learning occurs with data, and our task is often to predict a new value based on previously observed data.

Information Retrieval (IR) is the task of looking up or retrieving information based on a query by the user. The algorithms that aid in performing this task are called information retrieval algorithms, and we will be encountering them throughout the book.

Text analysis itself has been around for a long time one of the first definitions of Business Intelligence (BI) itself, in an October 1958 IBM Journal article by H. P. Luhn, A Business Intelligence System [1], describes a system that will do the following:

"...utilize data-processing machines for auto-abstracting and auto-encoding of documents and for creating interest profiles for each of the 'action points' in an organization. Both incoming and internally generated documents are automatically abstracted, characterized by a word pattern, and sent automatically to appropriate action points."

It's interesting to see talk about documents, instead of numbers to think that the first ideas of business intelligence were understanding text and documents is again a testament to text analysis throughout the ages. But even outside the realm of text analysis for business, using computers to better understand text and language has been around since the beginning of ideas of artificial intelligence. The 1999 review on text analysis by John Hutchins, Retrospect and prospect in computer-based translation [2], talks about efforts to do machine translation as early as the 1950s by the United States military, in order to translate Russian scientific journals into English.

Efforts to make an intelligent machine started with text as well the ELIZA program developed in 1966 at MIT by Joseph Weizenbaum is one example. Even though the program had no real understanding of language, by basic pattern matching it could attempt to hold a conversation. These are just some of the earliest attempts to analyze text computers (and human beings!) have come a long way since, and we now have incredible tools at our disposal.

Machine translation itself has come a long way, and we can now use our smartphones to effectively translate between languages, and with cutting-edge techniques such as Google's Neural Machine Translation, the gap between academia and industry is reducing allowing us to actually experience the magic of natural language processing first hand.

Fig 1.2 An example of a Neural Translation model, working on French to English

Advances in this subject have helped advance the way we approach speech as well closed captioning in videos, and personal assistants such as Apple's Siri or Amazon's Alexa are greatly benefited by superior text processing. Understanding structure in conversations and extracting information were key problems in early NLP, and the fruits of the research done are being very apparent in the 21st century.

Search engines such as Google or Bing! also stand on the shoulders of the research done in NLP and CL and affect our lives in an unprecedented way. Information retrieval (IR) builds on statistical approaches in text processing and allows us to classify, cluster, and retrieve documents. Methods such as topic modeling can help us identify key topics in large, unstructured bodies of text. Identifying these topics goes beyond searching for keywords, and we use statistical models to further understand the underlying nature of bodies of text. Without the power of computers, we could not perform this kind of large-scale statistical analysis on the text. We will be exploring topic modeling in detail later on in the book.

Fig 1.3 Techniques such as topic modeling use probabilistic modeling methods to identify key topics from the text. We will be studying this in detail later in the book

Going one step ahead of just being able to experience the wonders of modern computing on our mobile phones, recent developments in both Python and NLP means that we can now develop such systems on our own!

Not only has there been an evolution in the techniques used in NLP and text analysis, it has become very accessible to us open source packages are becoming state-of-the-art, performing as well as commercial tools. An example of a commercial tool would be Microsoft's Text Analysis API (https://azure.microsoft.com/en-us/services/cognitive-services/text-analytics/).

MATLAB is another example of a popular commercial tool used for scientific computing. While historically such commercial tools performed better than free, open source software, an increase in people contributing to open source libraries, as well as funding from industry has helped the open source community immensely. Now, the tables appear to have turned and many software giants use open source packages for their internal systems such as Google using TensorFlow and Apple using scikit-learn! TensorFlow and scikit-learn are two open source Python machine learning packages.

It can be argued that the sheer number of packages offered by the Python ecosystem means it leads the pack when it comes to doing text analysis, and we will focus our efforts here. A very strong and active open source community adds to the appeal.

Throughout the course of the book, we will discuss modern natural language processing and computational linguistics techniques and the best open source tools available to us which we can use to apply these techniques.

Where's the data at?

While it is important to be aware of the techniques and the tools involved in NLP and CL, it is, of course, pointless without any data. Luckily for us, we have access to an abundance of data if we look in the right places. The easiest way to find textual data to work on is to look for a corpus.

A text corpus is a large and structured set of texts and is a great way to start off with text analysis. Examples of such corpora that are free are the Open American National Corpus [5] or the British National Corpus [6]. Wikipedia has a useful list of the largest corpuses available in its article on text corpuses [7]. These are not limited to the English language, and there also exist various corpuses in European and Asian languages, and there are constant efforts worldwide to create corpuses for majority of languages. Universities research labs are another valuable source for obtaining corpuses indeed, one of the most iconic English language corpuses, the Brown Corpus, was put together at Brown University.

Different corpuses tend to have varying levels of information present, usually dependent on the primary purpose for that corpora for example, corpora whose primary function is to aid during translation would have the same sentence present in multiple languages. Another way corpora have extra information is through annotation. Examples of annotation in text usually include Part-Of-Speech (POS) tagging or Named-Entity-Recognition (NER). POS-tagging refers to marking each word in a sentence with its part of speech (Noun, verb, adverb, and so on), and a corpus annotated for NER would have all named entities recognized, such as places, people, and times. We'll be further going into details of both POS-tagging and NER later on in the book, in Chapter 5, POS-Tagging and its Applications and Chapter 6, NER-Tagging and its Applications.

Based on the structure and varying levels of information present in the corpora, it would have a different purpose. Some corpora are also built to evaluate clustering or classification tasks, where rather than annotation being important, the label or class would be. This means that some corpora are designed to aid with machine learning tasks such as cluster or classification by providing text with labels tagged by humans. Clustering refers to the task of grouping similar objects together, and classification is the process of deciding which predefined class an identifying what exactly your dataset is going to be used for is a crucial part of text analysis and an important first step.

Apart from downloading datasets or scraping data off the internet, there are still some rich sources for gathering our textual data in particular, literature. One example of this is the research done at the University of Pennsylvania, where Alejandro Ribeiro, Santiago Segarra, Mark Eisen, and Gabriel Egan discovered possible collaborators of Shakespeare, a literary history problem that stumbled many researchers [14]. They approached the problem by identifying literary styles an upcoming field of study in computational linguistics called style analysis.

The increased use of computational tools to perform research in the humanities has also led to the growth of Digital Humanities labs in universities, where traditional research approaches are either aided or overtaken by computer science, and in particular machine learning (and by extension), natural language processing. Speeches of politicians, or proceedings in parliament, for example, are another example of a data source used often in this community. TheyWorkForYou [17] is A UK parliament tracking system, which gets speeches and uploads them and is an example of the many sites available doing this kind of work.

Project Gutenberg is likely the best resource to download books and contains over 50,000 free eBooks and many literary classics. Personal PDFs and eBooks also remain a resource, but again, it is important to know the legal nature of your text before analyzing it. Downloading a pirated copy of, say, Harry Potter off the internet and publishing text analysis results might not be the best idea if you cannot explain where you got the text from! Similarly, text analysis on private text messages might not only annoy your friends but also could be infringing on privacy laws.

Fig 1.4 An example of a text dataset list here, it is of reviews datasets found on

So where else apart from downloading a structured data-set straight off the internet, do we get our textual data? Well, the internet, of course. Even if it isn't labelled, the sheer amount of text on the internet means that we can access large parts of it the [7] is one such example, and the media dump of all the content on Wikipedia, after unzipping, is about 58 GB (as of April 2018) more than enough text to play around with. The popular news aggregation website reddit.com [9] allows for easy web-scraping and is another great resource for text analysis.

Python again remains a great choice to use for any such web-scraping, and libraries such as BeautifulSoup [10], urllib [11] and scrapy [12] are designed particularly for this. It is important to remain careful about the legal side of things here, and make sure to check the terms and conditions of the website where you are scraping the data from a number of websites will not allow you to use the information on the website for commercial purposes.

Twitter is another website that is fast becoming a very important part of text analysis you even have academia taking this resource very seriously (What is Twitter, a social network or a news media? [13] has over 5000 citations!), with multiple papers being written on text analysis of tweets, and even full-fledged tools [15] to do sentiment analysis have been built! The Twitter-streaming API allows us to easily mine for textual data from Twitter as well, and the Python interface [16] is straightforward. Most world leaders are users of Twitter, as well as celebrities and major news corporations there is a lot of interesting insights Twitter can offer us.

Fig 1.5 An example of the rich text resource Twitter has become, with multiple structured datasets available [7]. These datasets, all mined from Twitter, have particular tasks, which can be used for and fall under the category of labeled datasets which we discussed before.

Other examples of textual information you can get off the internet include research articles, medical reports, restaurant reviews (the Yelp! dataset comes to mind), and other social media websites. Sentiment analysis is usually the prime objective in these cases. As the name suggests, sentiment analysis refers to the task of identifying sentiment in text. These sentiments can be basic, such as positive or negative sentiment, but we could have more complex sentiment analysis tasks where we analyze whether a sentence contains happy, sad, or angry sentiments.

It's clear that if we look hard enough, it's more than easy to find data to play around with. But let's take a small step back from downloading data off the internet where else can we try and find information?

Right in our hands, as it may seem we send and receive text messages and emails every day, and we can use this text for text analysis. Most text messaging applications have interfaces to download chats. WhatsApp, for example, will mail the data to you [18], with both media and text. Most mail clients have the same option, and the advantage in both these cases is that this kind of data is often well organized, allowing for easy cleaning and preprocessing before we dive into the data.

One aspect we've ignored so far whilst talking about data is the noise which is often in the text in tweets, for example, short forms and emoticons which are often used, and in some cases, we have multi-lingual data where a simple analysis might fail. This brings us to arguably the most important aspect of text analysis preprocessing.

Garbage in, garbage out

Garbage in, garbage out (or GIGO) is an adage of computer science which is even more important when dealing with machine learning and possibly even more so when dealing with textual data. Garbage in, garbage out means that if we have poorly formatted data, it is likely we will have poor results.

Fig 1.6 XKCD hits the hammer on the nail once again (https://xkcd.com/1838/)

While more data usually leads to a better prediction, it isn't always the same case with text analysis, where more data can result in nonsense results or results which we don't always want. An intuitive example: the part of speech, articles, such as the words a, or the tend to appear a lot in text, but not adding any information to the text, and is usually limited to grammar or structure.

Words such as these which don't provide useful information are called stop words, and these words are often removed from the text before applying text analysis techniques on them. Similarly, sometimes we remove words with very high frequency in the body of text, and words which only appear once or twice it is highly likely these words will not be useful to our analysis. That being said, this depends heavily on the kind of task being performed - if, for example, we would want to replicate human writing styles, stop words are important because humans many such words when writing. An example of how stop words can also include useful information is in this article, Pastiche detection based on stopword rankings. Exposing impersonators of a Romanian writer [20], is a study identified a certain author using frequency of stop words.

Let's consider another example where we might be dealing with useless data if searching for influential words or topics in the text, would it make sense to have both the words reading and read in the results? Here, shortening the word reading to read would not lead to any loss of information. But on a similar note, it would make sense to have the words information and inform exist separately in the same body of text, because they could mean different things based on the context. We would then need techniques to shorten words appropriately. Lemmatizing and stemming are two methods we use to tackle this problem and remain two of the core concepts in natural language processing. We will be exploring these two techniques in more detail in Chapter 3, spaCy's Language models.

Even after basic text-processing, our data is still a collection of words. Since machines do not inherently understand the concepts tied to words, we can instead use numbers that represent individual words. The next important step in text analysis is converting words into numbers, whether it is bag-of-words (BOW), or term frequency-inverse document frequency (TF-IDF), which are different ways to count the number of words in each document or sentence. There are also more advanced techniques to represent words such as Word2Vec and GloVe.

We will go into these details and techniques in more detail in the chapter on preprocessing techniques it is especially important to understand the motivation behind these techniques, and that a computer's output is only as good as the input you feed it.

Why should you do text analysis?

We've talked about what text analysis is, where we can find the data, and some of the things to keep in mind before diving into text analysis. But after all, what motivation do you, the reader, have to actually go about doing text analysis?

For starters, it's the sheer abundance of easily available data that we can use. In the big data age, there really is no excuse to not have a look at what all our data really means. In fact, apart from the massive data sets, we can download off the internet, we also have access to small data text messages, emails, a collection of poems are such examples. You could even do a meta-analysis and run an analysis on this very book! Textual data is even easier to get a hold-off, but far more importantly - it's easy to interpret and understand the results of the analysis. Numbers might not always make sense and are not always appealing to look at - but words are easier for us human beings to appreciate.

Text analysis remains exciting also because we can use data which directly involves the user- our own text conversations, our favorite childhood book, or tweets by our favorite celebrity. The personal nature of text data always adds an extra bit of motivation, and it also likely means we are aware of the nature of the data, and what kind of results to expect.

NLP techniques can also help us construct tools that can assist personal businesses or enterprises chatbots, for example, are becoming increasingly common in major websites, and with the right approach, it is possible to have a personal chat-bot. This is largely due to a subfield of machine learning, called Deep Learning, where we use algorithms and structures that are inspired by the structure of the human brain. These algorithms and structures are also referred to as neural networks. Advances in deep learning have introduced to powerful neural networks such as Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs). Now, even with minimal knowledge of the mathematical functioning of these algorithms, high-level APIs are allowing us to use these tools. Integrating this into our daily life is no longer reserved for computer science researchers or full-time engineers with the right collection of data and open source packages, this is well within our capabilities.

Open source packages have become industry standard Google has released and maintains TensorFlow [21], and packages such as scikit-learn [22] are used by Apple and Spotify, and spaCy [23], which we will extensively discuss throughout this book is used by Quora, a popular question-answer website.

We are no longer limited by either data or the tools the only two things we would need to do text analysis.

The programming language Python will be our friend throughout the book, and all the tools we will use will all be free open-source software. While we move towards open science, we also move towards open source code, and this will remain a key philosophy throughout the book. In the world of research, open source code means academic results are reproducible and available to all those interested. Python remains an easy-to-use and powerful language and serves as a great way to enter the world of natural language processing.

One could argue that the last thing needed was the knowledge of how to apply these tools and to wrangle with the data but that is precisely the purpose of the book and, hoping to let the reader build their own natural language processing pipelines and models at the end of the journey.

Summary

We've had a look at the incredible power of text analysis, and the kind of things we can do with it as well as the kind of tools we would be using to take advantage of this. Data has become increasingly easy for us to access, and with the growth of social media, we have continuous access to both new data, as well as standardized annotated datasets.

This book will aim at walking the reader through the tools and knowledge required to conduct textual analysis on their own personal data or own standardized datasets. We will discuss methods to access and clean data to make it ready for preprocessing, as well as how to explore and organize our textual data. Classification and clustering are two other commonly conducted text processing tasks, and we will figure out how to perform this as well, before finishing up with how to use deep learning for text.

In the next chapter, we will introduce how and why Python is the right choice for our purposes, as well as discuss some Python tricks and tips to help us with text analysis.

References

[1] A business intelligence system H. P. Lunn, October 1958 (https://dl.acm.org/citation.cfm?id=1662381)

[2] Retrospect and prospect in computer-based translation John Hutchins, September 1999 (http://www.mt-archive.info/90/MTS-1999-Hutchins.pdf)

[3] Introduction to Neural Machine Translation with GPUs:
https://devblogs.nvidia.com/parallelforall/introduction-neural-machine-translation-gpus-part-2/

[4] Text Mining :
https://en.wikipedia.org/wiki/Text_mining

[5] Open American National Corpus:
http://www.anc.org

[6] British National Corpus:
http://www.natcorp.ox.ac.uk

[7] List of Text Corpora:
https://en.wikipedia.org/wiki/List_of_text_corpora

[8] Wikipedia Dataset:
https://en.wikipedia.org/wiki/Wikipedia:Database_download

[9] Reddit, news aggregation website:
https://www.reddit.com

[10] Beautiful Soup:
https://www.crummy.com/software/BeautifulSoup/bs4/doc/

[11] UrlLib:
https://docs.python.org/2/library/urllib.html

[12] Scrapy:
https://scrapy.org

[13] What is Twitter, a social network or a news media?:
https://dl.acm.org/citation.cfm?id=1772751

[14] Shakespeare and his co-authors:
https://www.upenn.edu/spotlights/shakespeare-and-his-co-authors-told-penn-engineers

[15] Tweet Sentiment Visualization:
https://www.csc2.ncsu.edu/faculty/healey/tweet_viz/tweet_app/

[16] Tweepy, twitter API:
http://www.tweepy.org

[17] TheyWorkForYou:
https://www.theyworkforyou.com

[18] Mailing WhatsApp chat history:
https://faq.whatsapp.com/en/android/23756533/

[19] Project Gutenburg:
https://www.gutenberg.org

[20] Pastiche detection based on stopword rankings. Exposing impersonators of a Romanian writer:
http://www.aclweb.org/anthology/W12-0411

[21] TensorFlow:
https://www.tensorflow.org

[22] Scikit-learn:
http://scikit-learn.org/stable/

[23] spaCy: https://spacy.io

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • • Discover the open source Python text analysis ecosystem, using spaCy, Gensim, scikit-learn, and Keras
  • • Hands-on text analysis with Python, featuring natural language processing and computational linguistics algorithms
  • • Learn deep learning techniques for text analysis

Description

Modern text analysis is now very accessible using Python and open source tools, so discover how you can now perform modern text analysis in this era of textual data. This book shows you how to use natural language processing, and computational linguistics algorithms, to make inferences and gain insights about data you have. These algorithms are based on statistical machine learning and artificial intelligence techniques. The tools to work with these algorithms are available to you right now - with Python, and tools like Gensim and spaCy. You'll start by learning about data cleaning, and then how to perform computational linguistics from first concepts. You're then ready to explore the more sophisticated areas of statistical NLP and deep learning using Python, with realistic language and text samples. You'll learn to tag, parse, and model text using the best tools. You'll gain hands-on knowledge of the best frameworks to use, and you'll know when to choose a tool like Gensim for topic models, and when to work with Keras for deep learning. This book balances theory and practical hands-on examples, so you can learn about and conduct your own natural language processing projects and computational linguistics. You'll discover the rich ecosystem of Python tools you have available to conduct NLP - and enter the interesting world of modern text analysis.

Who is this book for?

This book is for you if you want to dive in, hands-first, into the interesting world of text analysis and NLP, and you're ready to work with the rich Python ecosystem of tools and datasets waiting for you!

What you will learn

  • • Why text analysis is important in our modern age
  • • Understand NLP terminology and get to know the Python tools and datasets
  • • Learn how to pre-process and clean textual data
  • • Convert textual data into vector space representations
  • • Using spaCy to process text
  • • Train your own NLP models for computational linguistics
  • • Use statistical learning and Topic Modeling algorithms for text, using Gensim and scikit-learn
  • • Employ deep learning techniques for text analysis using Keras
Estimated delivery fee Deliver to Malta

Premium delivery 7 - 10 business days

€32.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Jun 29, 2018
Length: 306 pages
Edition : 1st
Language : English
ISBN-13 : 9781788838535
Category :
Languages :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to Malta

Premium delivery 7 - 10 business days

€32.95
(Includes tracking information)

Product Details

Publication date : Jun 29, 2018
Length: 306 pages
Edition : 1st
Language : English
ISBN-13 : 9781788838535
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 98.97
Hands-On Natural Language Processing with Python
€32.99
Natural Language Processing with TensorFlow
€32.99
Natural Language Processing and Computational Linguistics
€32.99
Total 98.97 Stars icon
Banner background image

Table of Contents

16 Chapters
What is Text Analysis? Chevron down icon Chevron up icon
Python Tips for Text Analysis Chevron down icon Chevron up icon
spaCy's Language Models Chevron down icon Chevron up icon
Gensim – Vectorizing Text and Transformations and n-grams Chevron down icon Chevron up icon
POS-Tagging and Its Applications Chevron down icon Chevron up icon
NER-Tagging and Its Applications Chevron down icon Chevron up icon
Dependency Parsing Chevron down icon Chevron up icon
Topic Models Chevron down icon Chevron up icon
Advanced Topic Modeling Chevron down icon Chevron up icon
Clustering and Classifying Text Chevron down icon Chevron up icon
Similarity Queries and Summarization Chevron down icon Chevron up icon
Word2Vec, Doc2Vec, and Gensim Chevron down icon Chevron up icon
Deep Learning for Text Chevron down icon Chevron up icon
Keras and spaCy for Deep Learning Chevron down icon Chevron up icon
Sentiment Analysis and ChatBots Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.6
(7 Ratings)
5 star 42.9%
4 star 14.3%
3 star 14.3%
2 star 14.3%
1 star 14.3%
Filter icon Filter
Top Reviews

Filter reviews by




David Baron Aug 28, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
15 chapters of fast-paced learning - the author gives you the bootstrapped resources for success with NLP. Every concept is explained clearly and with powerful examples and analogies. Increasingly complex concepts are stacked upon each other to produce a comprehensive overview of everything in NLP, from linguistics-based methods to bag-of-words approaches, from k-means clustering and LDA models to the inevitable methods of deep learning. Highly recommended.
Amazon Verified review Amazon
TM Raghavan Aug 17, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
What I liked about this book is its style of delivery, an easy read one. As it states in the beginning, though fluency in Python or NLP concepts is better to make full use of the ideas, it still contains lots of code examples and analysis of their results, like any good book on this genre. The breadth and and the depth from the coverage of the book's focus are sufficient in my opinion.What I find missing in the book is a section on Glossary, encompassing statistics, vector /matrix algebra, NLP/IR, ANN (CNN/RNN), that are relevant to the readers' requirements. Perhaps this can be given in the beginning itself (like a Forward).Overall I find the book very useful for the students of final year Under-Graduate and Graduate programs besides researchers whose foci are outside of basic NLP but would want to analyse large texts for their research.
Amazon Verified review Amazon
Juan Oct 08, 2018
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Great book, up to date, engaging and covers a lot of topics clearly.
Amazon Verified review Amazon
Victor Aug 27, 2018
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
Although the introduction states that is necessary be fluent in Python, this book is feasible for anyone who has just a basic understanding of Python or any other programming language. The book describes profoundly all the steps of a NLP analysis, setting comprehensible examples that allows the reader to understand every aspect of NLP. I would consider this book as an advanced and complete introduction, recommended to anyone who wants to begin and follow further on the topic but getting a solid base from this book. I would recommend it.
Amazon Verified review Amazon
Itai Aug 14, 2018
Full star icon Full star icon Full star icon Empty star icon Empty star icon 3
Writing is a profession. Even excellent engineers may be bad writers, this is exactly what happens here.Super long sentences on trivial topics, and then just a taste from the really important stuff.For example: having several full pages on 1st grade stuff as "lower(), upper(), isNumeric()", having several pages on word2Vec King-man+woman = queen, and then have only a few lines about Doc2Vec which is the most important thing.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela