You're reading from Mastering Data Mining with Python ??? Find patterns hidden in your data Find patterns hidden in your data

Product type Paperback

Published in Aug 2016

Publisher

ISBN-13 9781785889950

Length 268 pages

Edition 1st Edition

Languages

Python

Tools

NLTK

Concepts

Data Mining

Author (1):

Megan Squire

View More author details

Table of Contents (11) Chapters

Preface

1. Expanding Your Data Mining Toolbox FREE CHAPTER

2. Association Rule Mining

3. Entity Matching

4. Network Analysis

5. Sentiment Analysis in Text

6. Named Entity Recognition in Text

7. Automatic Text Summarization

8. Topic Modeling in Text

9. Mining for Data Anomalies

Index

Summary

In this chapter, we learned what it would take to expand our data mining toolbox to the master level. First we took a long view of the field as a whole, starting with the history of data mining as a piece of the knowledge discovery in databases (KDD) process. We also compared the field of data mining to other similar fields such as data science, machine learning, and big data.

Next, we outlined the common tools and techniques that most experts consider to be most important to the KDD process, paying special attention to the techniques that are used most frequently in the mining and analysis steps. To really master data mining, it is important that we work on problems that are different than simple textbook examples. For this reason, we will be working on more exotic data mining techniques such as generating summaries and finding outliers, and focusing on more unusual data types, such as text and networks.

Finally, in this chapter we put together a robust data mining system for ourselves. Our workspace centers around the powerful, general-purpose programming language, Python, and its many useful data mining packages, such as NLTK, Gensim, Numpy, Networkx, and Scikit-learn, and it is complemented by an easy-to-use and free MySQL database.

Now, all this discussion of software packages has got me thinking: Have you ever wondered what packages are used most frequently together? Is the combination of NLTK and Networkx a common thing to see, or is this a rather unusual pairing of libraries? In the next chapter, we will work on solving exactly that type of problem. In Chapter 2, Association Rule Mining, we will learn how to generate a list of frequently-found pairs, triples, quadruples, and more, and then we will attempt to make predictions based on the patterns we found.

You're reading from Mastering Data Mining with Python ??? Find patterns hidden in your data Find patterns hidden in your data

Table of Contents (11) Chapters

Summary

Authors (1)

Personalised recommendations for you