About the Book
If Natural Language Processing (NLP) isn't really your forte, Natural Language Processing Fundamentals will make sure you get off to a steady start in the realm of NLP. This comprehensive guide will show you how to effectively use Python libraries and NLP concepts to solve various problems.
You'll be introduced to NLP and its applications through examples and exercises. This will be followed by an introduction to the initial stages of solving a problem, which includes problem definition, getting text data, and preparing text data for modeling. With exposure to concepts such as advanced NLP algorithms and visualization techniques, you'll learn how to create applications that can extract information from unstructured data and present it as impactful visuals. Although you will continue to learn NLP-based techniques, the focus will gradually shift to developing useful applications. In those sections, you'll gain an understanding of how to apply NLP techniques to answer questions, as can be used for chatbots.
By the end of this book, you'll be able to accomplish a varied range of assignments, ranging from identifying the most suitable type of NLP task for solving a problem, to using a tool such as spaCy or Gensim to perform sentiment analysis. The book will equip you with the knowledge you need to build applications that interpret human language.
About the Authors
Sohom Ghosh is a passionate data detective with expertise in Natural Language Processing. He has publications in several international conferences and journals.
Dwight Gunning is a data scientist at FINRA, a financial services regulator in the US. He has extensive experience in Python-based machine learning and hands-on experience with the most popular NLP tools, such as NLTK, Gensim, and spaCy.
Learning Objectives
By the end of this book, you will be able to:
- Obtain, verify, and clean data before transforming it into a correct format for use
- Perform data analysis and machine learning tasks using Python
- Gain an understanding of the basics of computational linguistics
- Build models for general NLP tasks
- Evaluate the performance of a model with the right metrics
- Visualize, quantify, and perform exploratory analysis from any text data
Audience
Natural Language Processing Fundamentals is designed for novice and mid-level data scientists and machine learning developers who want to gather and analyze text data to build an NLP-powered product. It'll help you to have prior experience of coding in Python using data types, writing functions, and importing libraries. Some experience with linguistics and probability is useful but not necessary.
Approach
This book starts with the very basics of reading text into Python code and progresses through the required pipeline of cleaning, stemming, and tokenizing text into a form suitable for NLP. The book then proceeds on to the fundamentals of NLP statistical methods, vector representation, and building models – using the most commonly used NLP libraries. Finally, the book gives students actual practice in using NLP models and code in applications.
Hardware Requirements
For the optimal student experience, we recommend the following hardware configuration:
- Any entry-level PC/Mac with Windows, Linux, or macOS is sufficient
- Processor: Dual core or equivalent
- Memory: 4 GB RAM
- Storage: 10 GB available space
Software Requirements
You'll also need the following software installed in advance:
- Operating system: Windows 7 SP1 32/64-bit, Windows 8.1 32/64-bit or Windows 10 32/64-bit, Ubuntu 14.04 or later, or macOS Sierra or later
- Browser: Google Chrome or Mozilla Firefox
- Anaconda
- Jupyter Notebook
- Python 3.x
Conventions
Code words in the text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "Find out the index
value of the word fox
using the following code."
A block of code is set as follows:
words = sentence.split() first_word = words[0] last_word = words[len(words)-1] concat_word = first_word + last_word print(concat_word)
New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "Stemming leads to inappropriate results such as "battling" getting transformed into battl, which has no meaning."
Installation and Setup
Before you start this book, we'll install Python 3.6, pip, scikit-learn, and the other libraries used in this book. You will find the steps to install these here:
Installing Python
Install Python 3.6 by following the instructions in this link: https://realpython.com/installing-python/.
Installing pip
- To install pip, go to this link and download the
get-pip.py
file: https://pip.pypa.io/en/stable/installing/. - Then, use the following command to install it:
python get-pip.py
You might need to use the
python3 get-pip.py
command, due to previous versions of Python on your computer that already use thepython
command.
Installing libraries
Using the pip command, install the following libraries:
python -m pip install --user numpy scipy matplotlib pandas scikit-learn nltk
Working with the Jupyter Notebook
You'll be working on different exercises and activities in a Jupyter notebook. These exercises and activities can be downloaded from the associated GitHub repository:
- Download the repository from here: https://github.com/TrainingByPackt/Natural-Language-Processing-Fundamentals.
You can either download it using GitHub or as a zipped folder by clicking on the green Clone or download button on the upper-right side.
- In order to open Jupyter notebooks, you have to traverse into the directory with your terminal. To do that, type:
cd Natural-Language-Processing-Fundamentals/<your current lesson>
.For example:
cd Natural-Language-Processing-Fundamentals/Lesson_01/
- To reach each activity and exercise, you have to use
cd
once more to go into each folder, like so:cd Activity01
- Once you are in the folder of your choice, simply call
jupyter notebook
.
Importing Python Libraries
Every exercise and activity in this book will make use of various libraries. Importing libraries into Python is very simple and here's how we do it:
- To import libraries such as NumPy and pandas, we have to run the following code. This will import the whole
numpy
library into our current file.import numpy# import numpy
- In the first cells of the exercises and activities of this book ware, you will see the following code. We can use
np
instead ofnumpy
in our code to call methods fromnumpy
:import numpy as np# import numpy and assign alias np
- In later chapters, partial imports will be present, as shown in the following code. This only loads the
mean
method from the library:from numpy import mean# only import the mean method of numpy
Installing the Code Bundle
Copy the code bundle for the class to the C:/Code
folder.
Additional Resources
The code bundle for this book is also hosted on GitHub at https://github.com/TrainingByPackt/Natural-Language-Processing-Fundamentals.
The high-quality color images used in book can be found at: https://github.com/TrainingByPackt/Natural-Language-Processing-Fundamentals/tree/master/Graphics.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!