Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Natural Language Processing with Python Quick Start Guide
Natural Language Processing with Python Quick Start Guide

Natural Language Processing with Python Quick Start Guide: Going from a Python developer to an effective Natural Language Processing Engineer

Arrow left icon
Profile Icon Kasliwal
Arrow right icon
€8.99 €19.99
eBook Nov 2018 182 pages 1st Edition
eBook
€8.99 €19.99
Paperback
€24.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon Kasliwal
Arrow right icon
€8.99 €19.99
eBook Nov 2018 182 pages 1st Edition
eBook
€8.99 €19.99
Paperback
€24.99
Subscription
Free Trial
Renews at €18.99p/m
eBook
€8.99 €19.99
Paperback
€24.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Natural Language Processing with Python Quick Start Guide

Getting Started with Text Classification

There are several ways that you can learn new ideas and learn new skills. In an art class students study colors, but aren't allowed to actually paint until college. Sound absurd?

Unfortunately, this is how most modern machine learning is taught. The experts are doing something similar. They tell you that need to know linear algebra, calculus and deep learning. This is before they'll teach you how to use natural language Processing (NLP).

In this book, I want us to learn by teaching the the whole game. In every section, we see how to solve real-world problems and learn the tools along the way. Then, we will dig deeper and deeper into understanding how to make these toolks. This learning and teaching style is very much inspired by Jeremy Howard of fast.ai fame.

The next focus is to have code examples wherever possible. This is to ensure that there is a clear and motivating purpose behind learning a topic. This helps us understand with intuition, beyond math formulae with algebraic notation.

In this opening chapter, we will focus on an introduction to NLP. And, then jump into a text classification example with code.

This is what our journey will briefly look like:

  • What is NLP?
  • What does a good NLP workflow look like? This is to improve your success rate when working on any NLP project.
  • Text classification as a motivating example for a good NLP pipeline/workflow.

What is NLP?

Natural language processing is the use of machines to manipulate natural language. In this book, we will focus on written language, or in simpler words: text.

In effect, this is a practitioner's guide to text processing in English.

Humans are the only known species to have developed written languages. Yet, children don't learn to read and write on their own. This is to highlight the complexity of text processing and NLP.

The study of natural language processing has been around for more than 50 years. The famous Turing test for general artificial intelligence uses this language. This field has grown both in regard to linguistics and its computational techniques.

In the spirit of being able to build things first, we will learn how to build a simple text classification system using Python's scikit-learn and no other dependencies.

We will also address if this book is a good pick for you.

Let's get going!

Why learn about NLP?

The best way to get the most about of this book is by knowing what you want NLP to do for you.

A variety of reasons might draw you to NLP. It might be the higher earning potential. Maybe you've noticed and are excited by the potential of NLP, for example, regarding Uber's customer Service bots. Yes, they mostly use bots to answer your complaints instead of humans.

It is useful to know your motivation and write it down. This will help you select problems and projects that excite you. It will also help you be selective when reading this book. This is not an NLP Made Easy or similar book. Let's be honest: this is a challenging topic. Writing down your motivations is a helpful reminder.

As a legal note, the accompanying code has a permissive MIT License. You can use it at your work without legal hassle. That being said, each dependent library is from a third party, and you should definitely check if they allow commercial use or not.

I don't expect you to be able to use all of the tools and techniques mentioned here. Cherry-pick things that make sense.

You have a problem in mind

You already have a problem in mind, such as an academic project or a problem at your work.

Are you looking for the best tools and techniques that you could use to get off the ground?

First, flip through to the book's index to check if I have covered your problem here. I have shared end-to-end solutions for some of the most common use cases here. If it is not shared, fret not—you are still covered. The underlying techniques for a lot of tasks are common. I have been careful to select methods that are useful to a wider audience.

Technical achievement

Is learning a mark of achievement for you?

NLP and, more generally, data science, are popular terms. You are someone who wants to keep up. You are someone who takes joy from learning new tools, techniques, and technologies. This is your next big challenge. This is your chance to prove your ability to self-teach and meet mastery.

If this sounds like you, you may be interested in using this as a reference book. I have dedicated sections where we give you enough understanding of a method. I show you how to use it without having to dive down into the latest papers. This is an invitation to learning more, and you are not encouraged to stop here. Try these code samples out for yourself!

Do something new

You have some domain expertise. Now, you want to do things in your domain that are not possible without these skills. One way to figure out new possibilities is to combine your domain expertise with what you learn here. There are several very large opportunities that I saw as I wrote this book, including the following:

  • NLP for non-English languages such as Hindi, Tamil, or Telugu.
  • Specialized NLP for your domain, for example, finance and Bollywood have different languages in their own ways. Your models that have been trained on Bollywood news are not expected to work for finance.

If this sounds like you, you want to pay attention to the text pre-processing sections in this book. These sections will help you understand how we make text ready for machine consumption.

Is this book for you?

This book has been written so that it keeps the preceding use cases and mindsets in mind. The methods, technologies, tools, and techniques selected here are a fine balance of industry-grade stability and academia-grade results quality. There are several tools, such as parfit, and Flashtext, and ideas such as LIME, that have never been written about in the context of NLP.

Lastly, I understand the importance and excitement of deep learning methods and have a dedicated chapter on deep learning for NLP methods.

NLP workflow template

Some of us would love to work on Natural Language Processing for its sheer intellectual challenges across research and engineering. To measure our progress, having a workflow with rough time estimates is really valuable. In this short section, we will briefly outline what a usual NLP or even most applied machine learning processes look like.

Most people I've learned from like to use a (roughly) five-step process:

  • Understanding the problem
  • Understanding and preparing data
  • Quick wins: proof of concepts
  • Iterating and improving the results
  • Evaluation and deployment

This is just a process template. It has a lot of room for customization regarding the engineering culture in your company. Any of these steps can be broken down further. For instance, data preparation and understanding can be split further into analysis and cleaning. Similarly, the proof of concept step may involve multiple experiments, and a demo or a report submission of best results from those.

Although this appears to be a strictly linear process, it is not so. More often than not, you will want to revisit a previous step and change a parameter or a particular data transform to see the effect on later performance.

In order to do so, it is important to factor in the cyclic nature of this process in your code. Write code with well-designed abstractions with each component being independently reusable.

If you are interested in how to write better NLP code, especially for research or experimentation, consider looking up the slide deck titled Writing Code for NLP Research, by Joel Grus of AllenAI.

Let's expand a little bit into each of these sections.

Understanding the problem

We will begin by understanding the requirements and constraints from a practical business view point. This tends to answer the following the questions:

  • What is the main problem? We will try to understand formally and informally the assumptions and expectations from our project.
  • How will I solve this problem? List some ideas that you might have seen earlier or in this book. This is the list that you will use to plan your work ahead.

Understanding and preparing the data

Text and language is inherently unstructured. We might want to clean it in certain ways, such as expanding abbreviations and acronyms, removing punctuation, and so on. We also want to select a few samples that are the best representatives of the data we might see in the wild.

The other common practice is to prepare a gold dataset. A gold dataset is the best available data under reasonable conditions. This is not the best available data under ideal conditions. Creating the gold dataset often involves manual tagging and cleaning processes.

The next few sections are dedicated to text cleaning and text representations at this stage of the NLP workflow.

Quick wins – proof of concept

We want to quickly spot the types of algorithms and dataset combinations that sort of work for us. We can then focus on them and study them in greater detail.

The results from here will help you estimate the amount of work ahead of you. For instance, if you are going to develop a search system for documents based exclusively on keywords, your main effort will probably be deploying an open source solution such as ElasticSearch.

Let's say that you now want to add a similar documents feature. Depending on the expected quality of results, you will want to look into techniques such as doc2vec and word2vec, or even some convolutional neural network solution using Keras/Tensorflow or PyTorch.

This step is essential to get a greater buy-in from others around you, such as your boss, to invest more energy and resources into this. In an engineering role, this demo should highlight parts of your work that the shelf systems usually can't do. These are your unique strengths. These are usually insights, customization, and control that other systems can't provide.

Iterating and improving

At this point, we have a selected list of algorithms, data, and methods that have encouraging results for us.

Algorithms

If your algorithms are machine learning or statistical in nature, you will quite often have a lot of juice left.

There are quite often parameters for which you simply pick a good enough default during the earlier stage. Here, you might want to double down and check for the best value of those parameters. This idea is sometimes referred to as parameter search, or hyperparameter tuning in machine learning parlance.

You might want to combine the results of one technique with the other in particular ways. For instance, some statistical methods might be very good for finding noun phrases in your text and using them to classify it, while a deep learning method (let's call it DL-LSTM) might be the best suited for text classification of the entire document. In that case, you might want to pass the extra information from both your noun phrase extraction and DL-LSTM to another model. This will allow it to the use the best of both worlds. This idea is sometimes referred to as stacking in machine learning parlance. This was quite successful on the machine learning contest platform Kaggle until very recently.

Pre-processing

Simple changes in data pre-processing or the data cleaning stage can quite often give you dramatically better results. For instance, making sure that your entire corpus is in lowercase can help you reduce the number of unique words (your vocabulary size) by a significant fraction.

If your numeric representation of words is skewed by the word frequency, sometimes it helps to normalize and/or scale the same. The laziest hack is to simply divide by the frequency.

Evaluation and deployment

Evaluation and deployment are critical components in making your work widely available. The quality of your evaluation determines how trustworthy your work is by other people. Deployment varies widely, but quite often is abstracted out in single function calls or REST API calls.

Evaluation

Let's say you have a model with 99% accuracy in classifying brain tumors. Can you trust this model? No.

If your model had said that no-one has a brain tumor, it would still have 99%+ accuracy. Why?

Because luckily 99% or more of the population does not have a brain tumor!

To use our models for practical use, we need to look beyond accuracy. We need to understand what the model gets right or wrong in order to improve it. A minute spent understanding the confusion matrix will stop us from going ahead with such dangerous models.

Additionally, we will want to develop an intuition of what the model is doing underneath the black box optimization algorithms. Data visualization techniques such as t-SNE can assist us with this.

For continuously running NLP applications such as email spam classifiers or chatbots, we would want the evaluation of the model quality to happen continuously as well. This will help us ensure that the model's performance does not degrade with time.

Deployment

This book is written with a programmer-first mindset. We will learn how to deploy any machine learning or NLP application as a REST API which can then be used for the web and mobile. This architecture is quite prevalent in the industry. For instance, we know that this is how data science teams such as those at Amazon and LinkedIn deploy their work to the web.

Example – text classification workflow

The preceding process is fairly generic. What would it look like for one of the most common natural language applications text classification?

The following flow diagram was built by Microsoft Azure, and is used here to explain how their own technology fits directly into our workflow template. There are several new words that they have introduced to feature engineering, such as unigrams, TF-IDF, TF, n-grams, and so on:

The main steps in their flow diagram are as follows:

  1. Step 1: Data preparation
  2. Step 2: Text pre-processing
  3. Step 3: Feature engineering:
    • Unigrams TF-IDF extraction
    • N-grams TF extraction
  4. Step 4: Train and evaluate models
  5. Step 5: Deploy trained models as web services

This means that it's time to stop talking and start programming. Let's quickly set up the environment first and then we will work on building our first text classification system in 30 lines of code or less.

Launchpad – programming environment setup

We will use the fast.ai machine learning setup for this exercise. Their setup environment is great for personal experimentation and industry-grade proof-of-concept projects. I have used the fast.ai environment on both Linux and Windows. We will use Python 3.6 here since our code will not run for other Python versions.

A quick search on their forums will also take you to the latest instructions on how to set up the same on most cloud computing solutions including AWS, Google Cloud Platform, and Paperspace.

This environment covers the tools that we will use across most of the major tasks that we will perform: text processing (including cleaning), feature extraction, machine learning and deep learning models, model evaluation, and deployment.

It includes spaCy out of the box. spaCy is an open source tool that was made for an industry-grade NLP toolkit. If someone recommends that you use NLTK for a task, use spaCy instead. The demo ahead works out of the box in their environment.

There are a few more packages that we will need for later tasks. We will install and set them up as and when required. We don't want to bloat your installation with unnecessary packages that you might not even use.

Text classification in 30 lines of code

Let's divide the classification problem into the following steps:

  1. Getting the data
  2. Text to numbers
  3. Running ML algorithms with sklearn

Getting the data

The 20 newsgroups dataset is a fairly well-known dataset among the NLP community. It is near-ideal for demonstration purposes. This dataset has a near-uniform distribution across 20 classes. This uniform distribution makes iterating rapidly on classification and clustering techniques easy.

We will use the famous 20 newsgroups dataset for our demonstrations as well:

from sklearn.datasets import fetch_20newsgroups  # import packages which help us download dataset 
twenty_train = fetch_20newsgroups(subset='train', shuffle=True, download_if_missing=True)
twenty_test = fetch_20newsgroups(subset='test', shuffle=True, download_if_missing=True)

Most modern NLP methods rely heavily on machine learning methods. These methods need words that are written as strings of text to be converted into a numerical representation. This numerical representation can be as simple as assigning a unique integer ID to slightly more comprehensive vector of float values. In the case of the latter, this is sometimes referred to as vectorization.

Text to numbers

We will be using a bag of words model for our example. We simply convert the number of times every word occurs per document. Therefore, each document is a bag and we count the frequency of each word in that bag. This also means that we lose any ordering information that's present in the text. Next, we assign each unique word an integer ID. All of these unique words become our vocabulary. Each word in our vocabulary is treated as a machine learning feature. Let's make our vocabulary first.

Scikit-learn has a high-level component that will create feature vectors for us. This is called CountVectorizer. We recommend reading more about it from the scikit-learn docs:

# Extracting features from text files
from sklearn.feature_extraction.text import CountVectorizer

count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(twenty_train.data)

print(f'Shape of Term Frequency Matrix: {X_train_counts.shape}')

By using count_vect.fit_transform(twenty_train.data), we are learning the vocabulary dictionary, which returns a Document-Term matrix of shape [n_samples, n_features]. This means that we have n_samples documents or bags with n_features unique words across them.

We will now be able to extract a meaningful relationship between these words and the tags or classes they belong to. One of the simplest ways to do this is to count the number of times a word occurs in each class.

We have a small issue with this long documents then tend to influence the result a lot more. We can normalize this effect by dividing the word frequency by the total words in that document. We call this Term Frequency, or simply TF.

Words like the, a, and of are common across all documents and don't really help us distinguish between document classes or separate them. We want to emphasize rarer words, such as Manmohan and Modi, over common words. One way to do this is to use inverse document frequency, or IDF. Inverse document frequency is a measure of whether the term is common or rare in all documents.

We multiply TF with IDF to get our TF-IDF metric, which is always greater than zero. TF-IDF is calculated for a triplet of term t, document d, and vocab dictionary D.

We can directly calculate TF-IDF using the following lines of code:

from sklearn.feature_extraction.text import TfidfTransformer

tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)

print(f'Shape of TFIDF Matrix: {X_train_tfidf.shape}')

The last line will output the dimension of the Document-Term matrix, which is (11314, 130107).

Please note that in the preceding example we used each word as a feature, so the TF-IDF was calculated for each word. When we use a single word as a feature, we call it a unigram. If we were to use two consecutive words as a feature instead, we'd call it a bigram. In general, for n-words, we would call it an n-gram.

Machine learning

Various algorithms can be used for text classification. You can build a classifier in scikit using the following code:

from sklearn.linear_model import LogisticRegression as LR
from sklearn.pipeline import Pipeline

Let's dissect the preceding code, line by line.

The initial two lines are simple imports. We import the fairly well-known Logistic Regression model and rename the import LR. The next is a pipeline import:

"Sequentially apply a list of transforms and a final estimator. Intermediate steps of the pipeline must be "transforms", that is, they must implement fit and transform methods. The final estimator only needs to implement fit."
- from sklearn docs

Scikit-learn pipelines are, logistically, lists of operations that are applied, one after another. First, we applied the two operations we have already seen: CountVectorizer() and TfidfTransformer(). This was followed by LR(). The pipeline was created with Pipeline(...), but hasn't been executed. It is only executed when we call the fit() function from the Pipeline object:

text_lr_clf = Pipeline([('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('clf',LR())])
text_lr_clf = text_lr_clf.fit(twenty_train.data, twenty_train.target)

When this is called, it calls the transform function of all but the last object. For the last object our Logistic Regression classifier its fit() function is called. These transforms and classifiers are also referred to as estimators:

"All estimators in a pipeline, except the last one, must be transformers (that is, they must have a transform method). The last estimator may be any type (transformer, classifier, and so on)."

Let's calculate the accuracy of this model on the test data. For calculating the means on a large number of values, we will be using a scientific library called numpy:

import numpy as np
lr_predicted = text_lr_clf.predict(twenty_test.data)
lr_clf_accuracy = np.mean(lr_predicted == twenty_test.target) * 100.

print(f'Test Accuracy is {lr_clf_accuracy}')

This prints out the following output:

Test Accuracy is 82.79341476367499

We used the LR default parameters here. We can later optimize these using GridSearch or RandomSearch to improve the accuracy even more.

If you're going to remember only one thing from this section, remember to try a linear model such as logistic regression. They are often quite good for sparse high-dimensional data such as text, bag-of-words, or TF-IDF.

In addition to accuracy, it is useful to understand which categories of text are being confused for which other categories. We will call this a confusion matrix.

The following code uses the same variables we used to calculate the test accuracy for finding out the confusion matrix:

from sklearn.metrics import confusion_matrix
cf = confusion_matrix(y_true=twenty_test.target, y_pred=lr_predicted)
print(cf)

This prints a giant list of numbers which is not very interpretable. Let's try pretty printing this by using the print-json hack:

import json
print(json.dumps(cf.tolist(), indent=2))

This returns the following code:

[
[
236,
2,
0,
0,
1,
1,
3,
0,
3,
3,
1,
1,
2,
9,
2,
35,
3,
4,
1,
12
],
...
[
38,
4,
0,
0,
0,
0,
4,
0,
0,
2,
2,
0,
0,
8,
3,
48,
17,
2,
9,
114
]
]

This is slightly better. We now understand that this is a 20 × 20 grid of numbers. However, interpreting these numbers is a tedious task unless we can bring some visualization into this game. Let's do that next:

# this line ensures that the plot is rendered inside the Jupyter we used for testing this code
%matplotlib inline

import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(20,10))
ax = sns.heatmap(cf, annot=True, fmt="d",linewidths=.5, center = 90, vmax = 200)
# plt.show() # optional, un-comment if the plot does not show

This gives us the following amazing plot:

This plot highlights information of interest to us in different color schemes. For instance, the light diagonal from the lupper-left corner to the lower-right corner shows everything we got right. The other grids are darker-colored if we confused those more. For instance, 97 samples of one class got wrongly tagged, which is quickly visible by the dark black color in row 18 and column 16.

We will dive deeper into both parts of this section model interpretation and data visualization in slightly more detail later in this book.

Summary

In this chapter, you got a feel for the broader things we need to make the project work. We saw the steps that are involved in this process by using a text classification example. We saw how to prepare text for machine learning with scikit-learn. We saw Logistic Regression for ML. We also saw a confusion matrix, which is a quick and powerful tool for making sense of results in all machine learning, beyond NLP.

We are just getting started. From here on out, we will dive deeper into each of these steps and see what other methods exist out there. In the next chapter, we will look at some common methods for text cleaning and extraction. Since this is what we will spend up to 80% of our total time on, it's worth the time and energy learning it.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • A no-math, code-driven programmer’s guide to text processing and NLP
  • Get state of the art results with modern tooling across linguistics, text vectors and machine learning
  • Fundamentals of NLP methods from spaCy, gensim, scikit-learn and PyTorch

Description

NLP in Python is among the most sought after skills among data scientists. With code and relevant case studies, this book will show how you can use industry-grade tools to implement NLP programs capable of learning from relevant data. We will explore many modern methods ranging from spaCy to word vectors that have reinvented NLP. The book takes you from the basics of NLP to building text processing applications. We start with an introduction to the basic vocabulary along with a work?ow for building NLP applications. We use industry-grade NLP tools for cleaning and pre-processing text, automatic question and answer generation using linguistics, text embedding, text classifier, and building a chatbot. With each project, you will learn a new concept of NLP. You will learn about entity recognition, part of speech tagging and dependency parsing for Q and A. We use text embedding for both clustering documents and making chatbots, and then build classifiers using scikit-learn. We conclude by deploying these models as REST APIs with Flask. By the end, you will be confident building NLP applications, and know exactly what to look for when approaching new challenges.

Who is this book for?

Programmers who wish to build systems that can interpret language. Exposure to Python programming is required. Familiarity with NLP or machine learning vocabulary will be helpful, but not mandatory.

What you will learn

  • Understand classical linguistics in using English grammar for automatically generating questions and answers from a free text corpus
  • Work with text embedding models for dense number representations of words, subwords and characters in the English language for exploring document clustering
  • Deep Learning in NLP using PyTorch with a code-driven introduction to PyTorch
  • Using an NLP project management Framework for estimating timelines and organizing your project into stages
  • Hack and build a simple chatbot application in 30 minutes
  • Deploy an NLP or machine learning application using Flask as RESTFUL APIs

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Nov 30, 2018
Length: 182 pages
Edition : 1st
Language : English
ISBN-13 : 9781788994101
Category :
Languages :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Nov 30, 2018
Length: 182 pages
Edition : 1st
Language : English
ISBN-13 : 9781788994101
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 82.97
Hands-On Natural Language Processing with Python
€32.99
Artificial Intelligence and Machine Learning Fundamentals
€24.99
Natural Language Processing with Python Quick Start Guide
€24.99
Total 82.97 Stars icon
Banner background image

Table of Contents

9 Chapters
Getting Started with Text Classification Chevron down icon Chevron up icon
Tidying your Text Chevron down icon Chevron up icon
Leveraging Linguistics Chevron down icon Chevron up icon
Text Representations - Words to Numbers Chevron down icon Chevron up icon
Modern Methods for Classification Chevron down icon Chevron up icon
Deep Learning for NLP Chevron down icon Chevron up icon
Building your Own Chatbot Chevron down icon Chevron up icon
Web Deployments Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.