Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Free Learning

The Natural Language Processing Workshop

You're reading from The Natural Language Processing Workshop Confidently design and build your own NLP projects with this easy-to-understand practical guide

Product type Paperback

Published in Aug 2020

Publisher Packt

ISBN-13 9781800208421

Length 452 pages

Edition 1st Edition

Languages

Processing

Tools

Jupyter

Concepts

Mobile Application Development

Authors (6):

Sohom Ghosh

Nipun Sadvilkar

Rohan Chopra

Muzaffar Bashir Shah

Dwight Gunning

Aniruddha M. Godbole

+2 more

View More author details

Table of Contents (10) Chapters

Preface

1. Introduction to Natural Language Processing

2. Feature Extraction Methods FREE CHAPTER

3. Developing a Text Classifier

4. Collecting Text Data with Web Scraping and APIs

5. Topic Modeling

6. Vector Representation

7. Text Generation and Summarization

8. Sentiment Analysis

Appendix

5. Topic Modeling

Activity 5.01: Topic-Modeling Jeopardy Questions

Solution

Let's perform topic modeling on the dataset of Jeopardy questions:

Open a Jupyter Notebook.

Insert a new cell and add the following code to import pandas and other libraries:

import numpy as np
import spacy
nlp = spacy.load('en_core_web_sm')
import pandas as pd
pd.set_option('display.max_colwidth', 800)

After downloading the data, you can extract it and place at the location below. Then load the Jeopardy CSV file into a pandas DataFrame. Insert a new cell and add the following code:
```
JEOPARDY_CSV =  '../data/jeopardy/Jeopardy.csv'
questions = pd.read_csv(JEOPARDY_CSV)
questions.columns = [x.strip() for x in questions.columns]
```
The data in the DataFrame is not clean. In order to clean it, remove records that have missing values in the Question column. Add the following code to do this:
```
questions = questions.dropna(subset=['Question'])
```
Find...

The rest of the chapter is locked

Register for a free Packt account to unlock a world of extra content!

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (6)

Rohan Chopra

Rohan Chopra

Rohan Chopra graduated from Vellore Institute of Technology with a bachelors degree in computer science. Rohan has an experience of more than 2 years in designing, implementing, and optimizing end-to-end deep neural network systems. His research is centered around the use of deep learning to solve computer vision-related problems and has hands-on experience working on self-driving cars. He is a data scientist at Absolutdata.

See other products by Rohan Chopra

Aniruddha M. Godbole

Aniruddha M. Godbole

Aniruddha M. Godbole is a data science consultant with inter-disciplinary expertise in computer science, applied statistics, and finance. He has a master's degree in data science from Indiana University, USA, and has done MBA in finance from the National Institute of Bank Management, India. He has authored papers in computer science and finance and has been an occasional opinion pages contributor to Mint, which is a leading business newspaper in India. He has fifteen years of experience.

See other products by Aniruddha M. Godbole

Nipun Sadvilkar

Nipun Sadvilkar

Nipun Sadvilkar is a senior data scientist at US healthcare company leading a team of data scientists and subject matter expertise to design and build the clinical NLP engine to revamp medical coding workflows, enhance coder efficiency, and accelerate revenue cycle. He has experience of more than 3 years in building NLP solutions and web-based data science platforms in the area of healthcare, finance, media, and psychology. His interests lie at the intersection of machine learning and software engineering with a fair understanding of the business domain. He is a member of the regional and national python community. He is author of pySBD - an NLP open-source python library for sentence segmentation which is recognized by ExplosionAI (spaCy) and AllenAI (scispaCy) organizations.

See other products by Nipun Sadvilkar

Muzaffar Bashir Shah

Muzaffar Bashir Shah

Muzaffar Bashir Shah is a software developer with vast experience in machine learning, natural language processing (NLP), text analytics, and data science. He holds a masters degree in computer science from the University of Kashmir and is currently working in a Bangalore based startup named Datoin.

See other products by Muzaffar Bashir Shah

Sohom Ghosh

Sohom Ghosh

Sohom Ghosh is a passionate data detective with expertise in natural language processing. He has worked extensively in the data science arena with a specialization in deep learning-based text analytics, NLP, and recommendation systems. He has publications in several international conferences and journals.

See other products by Sohom Ghosh

Dwight Gunning

Dwight Gunning

Dwight Gunning is a data scientist at FINRA, a financial services regulator in the US. He has extensive experience in Python-based machine learning and hands-on experience with the most popular NLP tools such as NLTK, gensim, and spacy.

See other products by Dwight Gunning

Other recommended products

Related to this chapter

Python Natural Language Processing Cookbook

Python Natural Language Processing Cookbook

Leverage your natural language processing skills to make sense of text. With this book, you'll learn fundamental and advanced NLP techniques in Python that will help you to make your data fit for application in a wide variety of industries. You'll also find recipes for overcoming common challenges in implementing NLP pipelines.

Mar 2021 9h 28m

Hands-On Python Natural Language Processing

Hands-On Python Natural Language Processing

This book provides a blend of both the theoretical and practical aspects of Natural Language Processing (NLP). It covers the concepts essential to develop a thorough understanding of NLP and also delves into a detailed discussion on NLP based use-cases such as language translation, sentiment analysis, etc. Every module covers real-world examples

Jun 2020 10h 32m

Hands-On Natural Language Processing with Python

Hands-On Natural Language Processing with Python

This book teaches you to leverage deep learning models in performing various NLP tasks along with showcasing the best practices in dealing with the NLP challenges. The book equips you with practical knowledge to implement deep learning in your linguistic applications using NLTk and Python's popular deep learning library, TensorFlow.

Jul 2018 10h 24m

Master Data Science with Python

Master Data Science with Python

Data Science with Python will help you get comfortable with using the Python environment for data science. You will learn all the libraries that a data scientist uses on a daily basis. By the end of this course, you will be able to take a large raw dataset, clean it, manipulate it, and run machine learning algorithms to obtain results that influence business decisions.

Jul 2019 14h 12m

Natural Language Processing with Java

Natural Language Processing with Java

Natural Language Processing with Java will explore how to automatically organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization. You will leverage the power of Java to extract relationships within different elements of text and documents.

Jul 2018 10h 36m

The Data Wrangling Workshop

The Data Wrangling Workshop

Data is the new oil, but it's often in a crude form. To perform anything meaningful, such as data modeling, data visualization, or predictive analysis, you first need to wrangle with and refine data. The Data Wrangling Workshop equips you with the knowledge you need to get up and running with data wrangling in no time.

Jul 2020 19h 12m

Ensemble Machine Learning Cookbook

Ensemble Machine Learning Cookbook

This book uses a recipe-based approach to showcase the power of machine learning algorithms to build ensemble models using Python libraries. Through this book, you will be able to pick up the code, understand in depth how it works, execute and implement it efficiently. This will be a desk reference to implement a wide range of tasks and solve the common and uncommon problems in ensemble machine learning domain.

Jan 2019 11h 12m

Natural Language Processing and Computational Linguistics

Natural Language Processing and Computational Linguistics

Discover how you can perform your own modern text analysis, to make predictions, create inferences, and gain insights about the data around you today. Learn how to harness the powerful Python ecosystem and tools such as spaCy and Gensim to perform natural language processing, and computational linguistics algorithms.

Jun 2018 10h 12m

Mastering spaCy

Mastering spaCy

Using machine learning-based NLP models, you can speed up business processes, make more accurate predictions, and uncover new insights from your existing data, where spaCy, an advanced industrial-grade natural language processing library, can help. With this book, you'll learn how to use it and create high-impact ML solutions for NLP.

Jul 2021 11h 52m

Hands-On Natural Language Processing with PyTorch 1.x

Hands-On Natural Language Processing with PyTorch 1.x

Developers working with NLP will be able to put their knowledge to work with this practical guide to PyTorch. You will learn to use PyTorch offerings and how to understand and analyze text using Python. You will learn to extract the underlying meaning in the text using deep neural networks and modern deep learning algorithms.

Jul 2020 9h 12m

The Deep Learning Workshop

The Deep Learning Workshop

With The Deep Learning Workshop, you'll learn about essential deep learning concepts, such as image recognition, text embedding, and neural networks, all so that you can build your own smart machine learning models. You'll be able to learn at your own pace with the help of interesting activities and hands-on exercises that will keep you hooked throughout the book.

Jul 2020 15h 48m

Natural Language Processing with Python Quick Start Guide

Natural Language Processing with Python Quick Start Guide

NLP in Python is among the most sought-after skills among data scientists. With code and relevant case studies, this book will show how you can use industry grade tools to implement NLP programs capable of learning from relevant data. We will explore many modern methods ranging from spaCy to word vectors that have reinvented NLP.