Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Python Data Mining Quick Start Guide

You're reading from   Python Data Mining Quick Start Guide A beginner's guide to extracting valuable insights from your data

Arrow left icon
Product type Paperback
Published in Apr 2019
Publisher Packt
ISBN-13 9781789800265
Length 188 pages
Edition 1st Edition
Languages
Concepts
Arrow right icon
Author (1):
Arrow left icon
Nathan Greeneltch Nathan Greeneltch
Author Profile Icon Nathan Greeneltch
Nathan Greeneltch
Arrow right icon
View More author details
Toc

Table of Contents (9) Chapters Close

Preface 1. Data Mining and Getting Started with Python Tools 2. Basic Terminology and Our End-to-End Example FREE CHAPTER 3. Collecting, Exploring, and Visualizing Data 4. Cleaning and Readying Data for Analysis 5. Grouping and Clustering Data 6. Prediction with Regression and Classification 7. Advanced Topics - Building a Data Processing Pipeline and Deploying It 8. Other Books You May Enjoy

What this book covers

The first three and a half chapters of the book are focused on the procedural nuts and bolts of a data mining project. This includes creating a data mining Python environment, loading data from a variety of sources, and munging the data for downstream analysis. The remaining content in the book is mostly conceptual, and delivered in a conversational style very close to how I would train a new hire at my company.

Chapter 1, Data Mining and Getting Started with Python Tools, covers the topic of getting started with your software environment. It also covers how to download and install high-speed Python and popular libraries such as pandas, scikit-learn, and seaborn. After reading this chapter and setting up your environment, you should be ready to follow along with the demonstrations throughout the rest of the book.

Chapter 2, Basic Terminology and our End-to-End Example, covers the basic statistics and data terminology that are required for working in data mining. The final portion of the chapter is dedicated to a full working example, which combined the types of techniques that will be introduced later on in this book. You will also have a better understanding of the thought processes behind analysis and the common steps taken to address a problem statement that you may encounter in the field.

Chapter 3, Collecting, Exploring, and Visualizing Data, covers the basics of loading data from databases, disks, and web sources. It also covers the basic SQL queries, and pandas' access and search functions. The last sections of the chapter introduce the common types of plots using Seaborn.

Chapter 4, Cleaning and Readying Data for Analysis, covers the basics of data cleanup and dimensionality reduction. After reading it, you will understand how to work with missing values, rescale input data, and handle categorical variables. You will also understand the troubles of high-dimensional data, and how to combat this with feature reduction techniques including filter, wrapper, and transformation methods.

Chapter 5, Grouping and Clustering Data, introduces the background and thought processes that goes into designing a clustering algorithm for data mining work. It then introduces common clustering methods in the field and carries out a comparison between all of them with toy datasets. After reading this chapter, you will know the difference between algorithms that cluster based on means separation, density, and connectivity. You will also be able to look at a plot of incoming data and have some intuition on whether clustering will fit your mining project.

Chapter 6, Prediction with Regression and Classification, covers the basics behind using a computer to learn prediction models by introducing the loss function and gradient descent. It then introduces the concepts of overfitting, underfitting, and the penalty approach to regularize your model during fits. It also covers common regression and classification techniques, and the regularized versions of each of these where appropriate. The chapter finishes with a discussion of best practices for model tuning, including cross-validation and grid search.

Chapter 7, Advanced Topics – Building a Data Processing Pipeline and Deploying, This chapter covers a strategy for pipe-lining and deploying using built-in Scikit-learn methods. It also introduces the pickle module for model persistence and storage, as well as discussing Python-specific concerns at deployment time.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime