Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Free Learning

You're reading from Hands-On Natural Language Processing with Python A practical guide to applying deep learning architectures to your NLP applications

Product type Paperback

Published in Jul 2018

Publisher Packt

ISBN-13 9781789139495

Length 312 pages

Edition 1st Edition

Languages

Processing

Tools

NLTK

Concepts

Deep Learning

Authors (5):

Rajalingappaa Shanmugamani

Chaitanya Joshi

Auguste Byiringiro

Rajesh Arumugam

Karthik Muthuswamy

+1 more

View More author details

Table of Contents (15) Chapters

Preface

1. Getting Started FREE CHAPTER

2. Text Classification and POS Tagging Using NLTK

3. Deep Learning and TensorFlow

4. Semantic Embedding Using Shallow Models

5. Text Classification Using LSTM

6. Searching and DeDuplicating Using CNNs

7. Named Entity Recognition Using Character LSTM

8. Text Generation and Summarization Using GRUs

9. Question-Answering and Chatbots Using Memory Networks

10. Machine Translation Using the Attention-Based Model

11. Speech Recognition Using DeepSpeech

12. Text-to-Speech Using Tacotron

13. Deploying Trained Models

14. Other Books You May Enjoy

Leave a review - let other readers know what you think

Building an RNN model for speech recognition

We will be using the free-spoken digits audio dataset from https://github.com/Jakobovski/free-spoken-digit-dataset/tree/master/recordings for our basic model. Download the data to any directory on your system. In the example code, replace the path referring to the .wav file with the path you have copied the data to.

Note that we have split the data into training data which includes 1,470 files and 30 for the test set.

Before we get into the details of the model itself, we will look at how to prepare it for the training. The most common preprocessing step used in practice is to transform the raw audio data into its frequency spectrum. The frequency spectrum or power spectrum is like a fingerprint for the data in which the raw audio is broken into constituent parts or frequencies. This representation helps in identifying which frequencies...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (5)

Shanmugamani

Rajalingappaa Shanmugamani is currently working as an Engineering Manager for a Deep learning team at Kairos. Previously, he worked as a Senior Machine Learning Developer at SAP, Singapore and worked at various startups in developing machine learning products. He has a Masters from Indian Institute of TechnologyMadras. He has published articles in peer-reviewed journals and conferences and submitted applications for several patents in the area of machine learning. In his spare time, he coaches programming and machine learning to school students and engineers.

See other products by Shanmugamani

Muthuswamy

See other products by Muthuswamy

Byiringiro

See other products by Byiringiro

Arumugam

Rajesh Arumugam is an ML developer at SAP, Singapore. Previously, he developed ML solutions for smart city development in areas such as passenger flow analysis in public transit systems and optimization of energy consumption in buildings when working with Centre for Social Innovation at Hitachi Asia, Singapore. He has published papers in conferences and has pending patents in storage and ML. He holds a PhD in computer engineering from Nanyang Technological University, Singapore.

See other products by Arumugam

Joshi

Vijay Joshi is a full stack web developer having more than a decade of experience in working with PHP and JavaScript.

See other products by Joshi