Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Newsletter Hub

Free Learning

You're reading from Hands-On Natural Language Processing with Python A practical guide to applying deep learning architectures to your NLP applications

Product type Paperback

Published in Jul 2018

Publisher Packt

ISBN-13 9781789139495

Length 312 pages

Edition 1st Edition

Languages

Processing

Tools

NLTK

Concepts

Deep Learning

Authors (5):

Rajalingappaa Shanmugamani

Chaitanya Joshi

Auguste Byiringiro

Rajesh Arumugam

Karthik Muthuswamy

+1 more

View More author details

Table of Contents (15) Chapters

Preface

1. Getting Started FREE CHAPTER

2. Text Classification and POS Tagging Using NLTK

3. Deep Learning and TensorFlow

4. Semantic Embedding Using Shallow Models

5. Text Classification Using LSTM

6. Searching and DeDuplicating Using CNNs

7. Named Entity Recognition Using Character LSTM

8. Text Generation and Summarization Using GRUs

9. Question-Answering and Chatbots Using Memory Networks

10. Machine Translation Using the Attention-Based Model

11. Speech Recognition Using DeepSpeech

12. Text-to-Speech Using Tacotron

13. Deploying Trained Models

14. Other Books You May Enjoy

Leave a review - let other readers know what you think

Data for text classification

Before diving into the machine learning (ML) problems in text classification, we will take a look at the different open datasets that are available on the internet. Many of the classification tasks may require large labeled text data. This data can be broadly grouped into those with binary classes, multi-classes, and multi-labels. The following are some of the popular datasets used for benchmarking in both research and some competitions, such as Kaggle:

...

	Dataset name	Class type	Source
1	`IMDb movie Dataset`	Binary classes	http://ai.stanford.edu/~amaas/data/sentiment/
2	`Twitter Sentiment Analysis Dataset`	Binary classes	http://thinknook.com/twitter-sentiment-analysis-training-corpus-dataset-2012-09-22/
3	`YouTube Spam Collection Dataset`	Binary classes	https://archive.ics.uci.edu/ml/datasets/YouTube+Spam+Collection

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (5)

Shanmugamani

Rajalingappaa Shanmugamani is currently working as an Engineering Manager for a Deep learning team at Kairos. Previously, he worked as a Senior Machine Learning Developer at SAP, Singapore and worked at various startups in developing machine learning products. He has a Masters from Indian Institute of TechnologyMadras. He has published articles in peer-reviewed journals and conferences and submitted applications for several patents in the area of machine learning. In his spare time, he coaches programming and machine learning to school students and engineers.

See other products by Shanmugamani

Arumugam

Rajesh Arumugam is an ML developer at SAP, Singapore. Previously, he developed ML solutions for smart city development in areas such as passenger flow analysis in public transit systems and optimization of energy consumption in buildings when working with Centre for Social Innovation at Hitachi Asia, Singapore. He has published papers in conferences and has pending patents in storage and ML. He holds a PhD in computer engineering from Nanyang Technological University, Singapore.

See other products by Arumugam

Byiringiro