Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Learning Predictive Analytics with Python

You're reading from   Learning Predictive Analytics with Python Gain practical insights into predictive modelling by implementing Predictive Analytics algorithms on public datasets with Python

Arrow left icon
Product type Paperback
Published in Feb 2016
Publisher
ISBN-13 9781783983261
Length 354 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (2):
Arrow left icon
Ashish Kumar Ashish Kumar
Author Profile Icon Ashish Kumar
Ashish Kumar
Gary Dougan Gary Dougan
Author Profile Icon Gary Dougan
Gary Dougan
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Getting Started with Predictive Modelling FREE CHAPTER 2. Data Cleaning 3. Data Wrangling 4. Statistical Concepts for Predictive Modelling 5. Linear Regression with Python 6. Logistic Regression with Python 7. Clustering with Python 8. Trees and Random Forests with Python 9. Best Practices for Predictive Modelling A. A List of Links
Index

What this book covers

Chapter 1, Getting Started with Predictive Modelling, talks about aspects, scope, and applications of predictive modelling. It also discusses various Python packages commonly used in data science, Python IDEs, and the methods to install these on systems.

Chapter 2, Data Cleaning, describes the process of reading a dataset, getting a bird's eye view of the dataset, handling the missing values in the dataset, and exploring the dataset with basic plotting using the pandas and matplotlib packages in Python. The data cleaning and wrangling together constitutes around 80% of the modelling time.

Chapter 3, Data Wrangling, describes the methods to subset a dataset, concatenate or merge two or more datasets, group the dataset by categorical variables, split the dataset into training and testing sets, generate dummy datasets using random numbers, and create simulations using random numbers.

Chapter 4, Statistical Concepts for Predictive Modelling, explains the basic statistics needed to make sense of the model parameters resulting from the predictive models. This chapter deals with concepts like hypothesis testing, z-tests, t-tests, chi-square tests, p-values, and so on followed by a discussion on correlation.

Chapter 5, Linear Regression with Python, starts with a discussion on the mathematics behind the linear regression validating the mathematics behind it using a simulated dataset. It is then followed by a summary of implications and interpretations of various model parameters. The chapter also describes methods to implement linear regression using the stasmodel.api and scikit-learn packages and handling various related contingencies, such as multiple regression, multi-collinearity, handling categorical variables, non-linear relationships between predictor and target variables, handling outliers, and so on.

Chapter 6, Logistic Regression with Python, explains the concepts, such as odds ratio, conditional probability, and contingency tables leading ultimately to detailed discussion on mathematics behind the logistic regression model (using a code that implements the entire model from scratch) and various tests to check the efficiency of the model. The chapter also describes the methods to implement logistic regression in Python and drawing and understanding an ROC curve.

Chapter 7, Clustering with Python, discusses the concepts, such as distances, the distance matrix, and linkage methods to understand the mathematics and logic behind both hierarchical and k-means clustering. The chapter also describes the methods to implement both the types of clustering in Python and methods to fine tune the number of clusters.

Chapter 8, Trees and Random Forests with Python, starts with a discussion on topics, such as entropy, information gain, gini index, and so on. To illustrate the mathematics behind creating a decision tree followed by a discussion on methods to handle variations, such as a continuous numerical variable as a predictor variable and handling a missing value. This is followed by methods to implement the decision tree in Python. The chapter also gives a glimpse into understanding and implementing the regression tree and random forests.

Chapter 9, Best Practices for Predictive Modelling, entails the best practices to be followed in terms of coding, data handling, algorithms, statistics, and business context for getting good results in predictive modelling.

Appendix, A List of Links, contains a list of sources which have been directly or indirectly consulted or used in the book. It also contains the link to the folder which contains datasets used in the book.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image