Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Natural Language Processing and Computational Linguistics

You're reading from   Natural Language Processing and Computational Linguistics A practical guide to text analysis with Python, Gensim, spaCy, and Keras

Arrow left icon
Product type Paperback
Published in Jun 2018
Publisher Packt
ISBN-13 9781788838535
Length 306 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Bhargav Srinivasa-Desikan Bhargav Srinivasa-Desikan
Author Profile Icon Bhargav Srinivasa-Desikan
Bhargav Srinivasa-Desikan
Arrow right icon
View More author details
Toc

Table of Contents (17) Chapters Close

Preface 1. What is Text Analysis? 2. Python Tips for Text Analysis FREE CHAPTER 3. spaCy's Language Models 4. Gensim – Vectorizing Text and Transformations and n-grams 5. POS-Tagging and Its Applications 6. NER-Tagging and Its Applications 7. Dependency Parsing 8. Topic Models 9. Advanced Topic Modeling 10. Clustering and Classifying Text 11. Similarity Queries and Summarization 12. Word2Vec, Doc2Vec, and Gensim 13. Deep Learning for Text 14. Keras and spaCy for Deep Learning 15. Sentiment Analysis and ChatBots 16. Other Books You May Enjoy

Why should you do text analysis?

We've talked about what text analysis is, where we can find the data, and some of the things to keep in mind before diving into text analysis. But after all, what motivation do you, the reader, have to actually go about doing text analysis?

For starters, it's the sheer abundance of easily available data that we can use. In the big data age, there really is no excuse to not have a look at what all our data really means. In fact, apart from the massive data sets, we can download off the internet, we also have access to small data text messages, emails, a collection of poems are such examples. You could even do a meta-analysis and run an analysis on this very book! Textual data is even easier to get a hold-off, but far more importantly - it's easy to interpret and understand the results of the analysis. Numbers might not always make sense and are not always appealing to look at - but words are easier for us human beings to appreciate.

Text analysis remains exciting also because we can use data which directly involves the user- our own text conversations, our favorite childhood book, or tweets by our favorite celebrity. The personal nature of text data always adds an extra bit of motivation, and it also likely means we are aware of the nature of the data, and what kind of results to expect.

NLP techniques can also help us construct tools that can assist personal businesses or enterprises chatbots, for example, are becoming increasingly common in major websites, and with the right approach, it is possible to have a personal chat-bot. This is largely due to a subfield of machine learning, called Deep Learning, where we use algorithms and structures that are inspired by the structure of the human brain. These algorithms and structures are also referred to as neural networks. Advances in deep learning have introduced to powerful neural networks such as Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs). Now, even with minimal knowledge of the mathematical functioning of these algorithms, high-level APIs are allowing us to use these tools. Integrating this into our daily life is no longer reserved for computer science researchers or full-time engineers with the right collection of data and open source packages, this is well within our capabilities.

Open source packages have become industry standard Google has released and maintains TensorFlow [21], and packages such as scikit-learn [22] are used by Apple and Spotify, and spaCy [23], which we will extensively discuss throughout this book is used by Quora, a popular question-answer website.

We are no longer limited by either data or the tools the only two things we would need to do text analysis.

The programming language Python will be our friend throughout the book, and all the tools we will use will all be free open-source software. While we move towards open science, we also move towards open source code, and this will remain a key philosophy throughout the book. In the world of research, open source code means academic results are reproducible and available to all those interested. Python remains an easy-to-use and powerful language and serves as a great way to enter the world of natural language processing.

One could argue that the last thing needed was the knowledge of how to apply these tools and to wrangle with the data but that is precisely the purpose of the book and, hoping to let the reader build their own natural language processing pipelines and models at the end of the journey.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime