Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering Predictive Analytics with Python

You're reading from   Mastering Predictive Analytics with Python Exploit the power of data in your business by building advanced predictive modeling applications with Python

Arrow left icon
Product type Paperback
Published in Aug 2016
Publisher
ISBN-13 9781785882715
Length 334 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Joseph Babcock Joseph Babcock
Author Profile Icon Joseph Babcock
Joseph Babcock
Arrow right icon
View More author details
Toc

Table of Contents (11) Chapters Close

Preface 1. From Data to Decisions – Getting Started with Analytic Applications FREE CHAPTER 2. Exploratory Data Analysis and Visualization in Python 3. Finding Patterns in the Noise – Clustering and Unsupervised Learning 4. Connecting the Dots with Models – Regression Methods 5. Putting Data in its Place – Classification Methods and Analysis 6. Words and Pixels – Working with Unstructured Data 7. Learning from the Bottom Up – Deep Networks and Unsupervised Features 8. Sharing Models with Prediction Services 9. Reporting and Testing – Iterating on Analytic Systems Index

What this book covers

Chapter 1, From Data to Decisions – Getting Started with Analytic Applications, teaches you to describe the core components of an analytic pipeline and the ways in which they interact. We also examine the differences between batch and streaming processes, and some use cases in which each type of application is well-suited. We walk through examples of both basic applications using both paradigms and the design decisions needed at each step.

Chapter 2, Exploratory Data Analysis and Visualization in Python, examines many of the tasks needed to start building analytical applications. Using the IPython notebook, we'll cover how to load data in a file into a data frame in pandas, rename columns in the dataset, filter unwanted rows, convert types, and create new columns. In addition, we'll join data from different sources and perform some basic statistical analyses using aggregations and pivots.

Chapter 3, Finding Patterns in the Noise – Clustering and Unsupervised Learning, shows you how to identify groups of similar items in a dataset. It's an exploratory analysis that we might frequently use as a first step in deciphering new datasets. We explore different ways of calculating the similarity between data points and describe what kinds of data these metrics might best apply to. We examine both divisive clustering algorithms, which split the data into smaller components starting from a single group, and agglomerative methods, where every data point starts as its own cluster. Using a number of datasets, we show examples where these algorithms will perform better or worse, and some ways to optimize them. We also see our first (small) data pipeline, a clustering application in PySpark using streaming data.

Chapter 4, Connecting the Dots with Models – Regression Methods, examines the fitting of several regression models, including transforming input variables to the correct scale and accounting for categorical features correctly. We fit and evaluate a linear regression, as well as regularized regression models. We also examine the use of tree-based regression models, and how to optimize parameter choices in fitting them. Finally, we will look at a sample of random forest modeling using PySpark, which can be applied to larger datasets.

Chapter 5, Putting Data in its Place – Classification Methods and Analysis, explains how to use classification models and some of the strategies for improving model performance. In addition to transforming categorical features, we look at the interpretation of logistic regression accuracy using the ROC curve. In an attempt to improve model performance, we demonstrate the use of SVMs. Finally, we will achieve good performance on the test set through Gradient-Boosted Decision Trees.

Chapter 6, Words and Pixels – Working with Unstructured Data, examines complex, unstructured data. Then we cover dimensionality reduction techniques such as the HashingVectorizer; matrix decompositions such as PCA, CUR, and NMR; and probabilistic models such as LDA. We also examine image data, including normalization and thresholding operations, and see how we can use dimensionality reduction techniques to find common patterns among images.

Chapter 7, Learning from the Bottom Up – Deep Networks and Unsupervised Features, introduces deep neural networks as a way to generate models for complex data types where features are difficult to engineer. We'll examine how neural networks are trained through back-propagation, and why additional layers make this optimization intractable.

Chapter 8, Sharing Models with Prediction Services, describes the three components of a basic prediction service, and discusses how this design will allow us to share the results of predictive modeling with other users or software systems.

Chapter 9, Reporting and Testing – Iterating on Analytic Systems, teaches several strategies for monitoring the performance of predictive models following initial design, and we look at a number of scenarios where the performance or components of the model change over time.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image