Search icon CANCEL
Subscription
0
Cart icon
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Learning Predictive Analytics with Python

You're reading from  Learning Predictive Analytics with Python

Product type Book
Published in Feb 2016
Publisher
ISBN-13 9781783983261
Pages 354 pages
Edition 1st Edition
Languages
Authors (2):
Ashish Kumar Ashish Kumar
Profile icon Ashish Kumar
Gary Dougan Gary Dougan
Profile icon Gary Dougan
View More author details

Table of Contents (19) Chapters

Learning Predictive Analytics with Python
Credits
Foreword
About the Author
Acknowledgments
About the Reviewer
www.PacktPub.com
Preface
1. Getting Started with Predictive Modelling 2. Data Cleaning 3. Data Wrangling 4. Statistical Concepts for Predictive Modelling 5. Linear Regression with Python 6. Logistic Regression with Python 7. Clustering with Python 8. Trees and Random Forests with Python 9. Best Practices for Predictive Modelling A List of Links
Index

Index

A

  • algorithms
    • best practices / Best practices for algorithms
  • Anaconda
    • about / Anaconda
  • ANOVA / Best practices for statistics
  • applications and examples, predictive modelling
    • about / Applications and examples of predictive modelling
    • People also viewed feature, LinkedIn / LinkedIn's "People also viewed" feature, What it does?
    • online ads, correct targeting / Correct targeting of online ads, How is it done?
    • Santa Cruz predictive policing / Santa Cruz predictive policing
    • smartphone user activity, determining / Determining the activity of a smartphone user using accelerometer data
    • sport and fantasy leagues / Sport and fantasy leagues

B

  • Bagging
    • about / Understanding and implementing random forests
  • Bell Curve
    • about / Cumulative density function
  • best practices
    • for coding / Best practices for coding
    • for data handling / Best practices for data handling
    • for algorithms / Best practices for algorithms
    • for statistics / Best practices for statistics
    • for business context / Best practices for business contexts
  • best practices, for coding
    • about / Best practices for coding
    • codes, commenting / Commenting the codes
    • functions, defining for substantial individual tasks / Defining functions for substantial individual tasks
    • examples, of functions / Defining functions for substantial individual tasks, Example 3
    • hard-coding of variables, avoiding / Avoid hard-coding of variables as much as possible
    • version control / Version control
    • standard libraries / Using standard libraries, methods, and formulas
    • methods / Using standard libraries, methods, and formulas
    • formulas / Using standard libraries, methods, and formulas
  • boxplots
    • about / Boxplots
    • plotting / Boxplots
  • business context
    • best practices / Best practices for business contexts

C

  • chi-square test
    • about / Chi-square tests, Chi-square test
    • usage / Chi-square tests
    / Best practices for statistics
  • clustering
    • about / What is clustering?
    • using / How is clustering used?
    • cases / Why do we do clustering?
  • clustering, fine-tuning
    • about / Fine-tuning the clustering
    • elbow method / The elbow method
    • Silhouette Coefficient / Silhouette Coefficient
  • clustering, implementing with Python
    • about / Implementing clustering using Python
    • dataset, importing / Importing and exploring the dataset
    • dataset, exporting / Importing and exploring the dataset
    • values in dataset, normalizing / Normalizing the values in the dataset
    • hierarchical clustering, using scikit-learn / Hierarchical clustering using scikit-learn
    • k-Means clustering, using scikit-learn / K-Means clustering using scikit-learn
    • cluster, interpreting / Interpreting the cluster
  • coding
    • best practices / Best practices for coding
  • contingency table
    • about / Contingency tables
    • creating / Contingency tables
  • correlation
    • about / Correlation
  • correlation coefficient
    • about / Correlation
  • Correlation Matrix
    • about / Correlation
  • Cumulative Density Function
    • about / Cumulative density function
  • Customer Churn Model
    • using / Method 1 – using the Customer Churn Model

D

  • data
    • versus oil / Introducing predictive modelling
    • reading / Reading the data – variations and examples
    • summary / Basics – summary, dimensions, and structure
    • structure / Basics – summary, dimensions, and structure
    • dimensions / Basics – summary, dimensions, and structure
    • concatenating / Concatenating and appending data
    • appending / Concatenating and appending data
  • data collection
    • about / How missing values are generated and propagated
  • data extraction
    • about / How missing values are generated and propagated
  • Data frame
    • about / Data frames
  • data grouping
    • about / Grouping the data – aggregation, filtering, and transformation
    • illustration / Grouping the data – aggregation, filtering, and transformation
    • aggregation / Aggregation
    • filtering / Filtering
    • transformation / Transformation
    • miscellaneous operations / Miscellaneous operations
  • data handling
    • best practices / Best practices for data handling
  • data importing, in Python
    • about / Various methods of importing data in Python
    • dataset, reading with read_csv method / Case 1 – reading a dataset using the read_csv method
    • dataset, reading with open method / Case 2 – reading a dataset using the open method of Python
    • dataset, reading from URL / Case 3 – reading data from a URL
    • miscellaneous cases / Case 4 – miscellaneous cases
  • dataset
    • visualizing, by basic plotting / Visualizing a dataset by basic plotting
    • sub-setting / Subsetting a dataset
    • columns, selecting / Selecting columns
    • rows, selecting / Selecting rows
    • combination of rows and columns, selecting / Selecting a combination of rows and columns
    • new columns, creating / Creating new columns
    • merging/joining / Merging/joining datasets
  • dataset, reading with open method
    • about / Case 2 – reading a dataset using the open method of Python
    • reading line by line / Reading a dataset line by line
    • delimiter, changing / Changing the delimiter of a dataset
  • decision tree
    • about / Introducing decision trees, A decision tree
    • using / A decision tree
    • mathematics / Understanding the mathematics behind decision trees
  • decision tree, implementing with scikit-learn
    • about / Implementing a decision tree with scikit-learn
    • tree, visualizing / Visualizing the tree
    • decision tree, cross-validating / Cross-validating and pruning the decision tree
    • decision tree, pruning / Cross-validating and pruning the decision tree
  • delimiter
    • about / Delimiters
  • distance matrix
    • about / The distance matrix
  • distances, between two observations
    • Euclidean distance / Euclidean distance
    • Manhattan distance / Manhattan distance
    • Minkowski distance / Minkowski distance
  • dummy data frame
    • generating / Generating a dummy data frame
  • dummy variables
    • creating / Creating dummy variables

E

  • elbow method / The elbow method
  • Euclidean distance
    • about / Euclidean distance

F

  • F-statistics
    • about / F-statistics
    • significance / F-statistics

G

  • guidelines, for selecting predictor variables
    • R2 / Summary of models
    • p-values / Summary of models
    • F-statistic / Summary of models
    • RSE / Summary of models
    • VIF / Summary of models

H

  • Harvard Business Review (HBR)
    • about / Introducing predictive modelling
  • heteroscedasticity / Other considerations and assumptions for linear regression
  • hierarchical clustering
    • about / Hierarchical clustering
  • histograms
    • about / Histograms
    • plotting / Histograms
  • hypothesis testing
    • about / Hypothesis testing
    • null hypothesis, versus alternate hypothesis / Null versus alternate hypothesis
    • Z-statistic / Z-statistic and t-statistic
    • t-statistic / Z-statistic and t-statistic
    • confidence intervals / Confidence intervals, significance levels, and p-values
    • significance levels / Confidence intervals, significance levels, and p-values
    • p-values / Confidence intervals, significance levels, and p-values
    • types / Different kinds of hypothesis test
    • step-by-step guide / A step-by-step guide to do a hypothesis test
    • example / An example of a hypothesis test
  • hypothesis tests
    • left-tailed / Different kinds of hypothesis test
    • right-tailed / Different kinds of hypothesis test
    • two-tailed / Different kinds of hypothesis test

I

  • IDEs, for Python
    • about / IDEs for Python
    • IDLE / IDEs for Python
    • IPython Notebook / IDEs for Python
    • Spyder / IDEs for Python
  • IDLE
    • about / IDEs for Python
    • features / IDEs for Python
  • Inner Join
    • characteristics / Inner Join
    • about / Inner Join
    • example / An example of the Inner Join
  • Inter Quartile Range(IQR) / Handling outliers
  • intra-cluster distance / The elbow method
  • IPython
    • about / Python and its packages for predictive modelling
    • URL / Python and its packages for predictive modelling
  • IPython Notebook
    • about / IDEs for Python
    • features / IDEs for Python
  • issues handling, in linear regression
    • about / Handling other issues in linear regression
    • categorical variables, handling / Handling categorical variables
    • variable, transforming to fit non-linear relations / Transforming a variable to fit non-linear relations
    • outliers, handling / Handling outliers

J

  • joins
    • summarizing / Summary of Joins in terms of their length

K

  • k-Means clustering
    • about / K-means clustering
  • knowledge matrix, predictive modelling
    • about / Knowledge matrix for predictive modelling

L

  • left-tailed test
    • about / Different kinds of hypothesis test
  • Left Join
    • characteristics / Left Join
    • about / Left Join
    • example / An example of the Left Join
  • Likelihood Ratio Test statistic
    • about / Likelihood Ratio Test statistic
  • linear regression
    • issues, handling / Handling other issues in linear regression
    • considerations / Other considerations and assumptions for linear regression
    • assumptions / Other considerations and assumptions for linear regression
    • versus logistic regression / Linear regression versus logistic regression
  • linear regression, implementing with Python
    • about / Implementing linear regression with Python
    • statsmodel library, using / Linear regression using the statsmodel library
    • multiple linear regression / Multiple linear regression
    • multi-collinearity / Multi-collinearity
    • Variance Inflation Factor (VIF) / Variance Inflation Factor
  • linkage methods
    • about / Linkage methods
    • single linkage / Single linkage
    • compete linkage / Compete linkage
    • average linkage / Average linkage
    • centroid linkage / Centroid linkage
    • Ward's method / Ward's method
  • logistic regression
    • scenarios / Linear regression versus logistic regression
    • math / Understanding the math behind logistic regression
  • logistic regression, with Python
    • implementing / Implementing logistic regression with Python
    • data, processing / Processing the data
    • data exploration / Data exploration
    • data visualization / Data visualization
    • dummy variables, creating for categorical variables / Creating dummy variables for categorical variables
    • feature selection / Feature selection
    • model, implementing / Implementing the model
  • logistic regression model
    • validation / Model validation and evaluation
    • evaluation / Model validation and evaluation
    • cross validation / Cross validation
  • logistic regression parameters
    • about / Making sense of logistic regression parameters
    • Wald test / Wald test
    • Likelihood Ratio Test statistic / Likelihood Ratio Test statistic
    • chi-square test / Chi-square test

M

  • Manhattan distance
    • about / Manhattan distance
  • math, behind logistic regression
    • about / Understanding the math behind logistic regression
    • contingency tables / Contingency tables
    • conditional probability / Conditional probability
    • odds ratio / Odds ratio
    • moving to logistic regression / Moving on to logistic regression from linear regression
    • estimation, using Maximum Likelihood Method / Estimation using the Maximum Likelihood Method, Log likelihood function:
    • logistic regression model, building from scratch / Building the logistic regression model from scratch
  • mathematics, behind clustering
    • about / Mathematics behind clustering
    • distances, between two observations / Distances between two observations
    • distance matrix / The distance matrix
    • distances, normalizing / Normalizing the distances
    • linkage methods / Linkage methods
    • hierarchical clustering / Hierarchical clustering
    • k-Means clustering / K-means clustering
  • mathematics, decision tree
    • homogeneity / Homogeneity
    • entropy / Entropy
    • information gain / Information gain
    • ID3 algorithm / ID3 algorithm to create a decision tree
    • Gini index / Gini index
    • Reduction in Variance / Reduction in Variance
    • tree, puring / Pruning a tree
    • continuous numerical variable, handling / Handling a continuous numerical variable
    • missing value of attribute, handling / Handling a missing value of an attribute
  • maths, behind linear regression
    • about / Understanding the maths behind linear regression
    • simulated data, using / Linear regression using simulated data
    • linear regression model, fitting / Fitting a linear regression model and checking its efficacy
    • linear regression model efficacy, checking / Fitting a linear regression model and checking its efficacy
    • optimum value of variable coefficients, finding / Finding the optimum value of variable coefficients
  • matplotlib
    • about / Python and its packages for predictive modelling
    • URL / Python and its packages for predictive modelling
  • miles per gallon (mpg) / Transforming a variable to fit non-linear relations
  • Minkowski distance
    • about / Minkowski distance
  • miscellaneous cases, data reading
    • reading, from .xls or .xlsx file / Reading from an .xls or .xlsx file
    • CSV or Excel file, writing to / Writing to a CSV or Excel file
  • missing values
    • handling / Handling missing values
    • checking for / Checking for missing values
    • about / What constitutes missing data?
    • generating / How missing values are generated and propagated
    • propagating / How missing values are generated and propagated
    • treating / Treating missing values
    • deletion / Deletion
    • imputation / Imputation
  • model validation
    • about / Model validation, Model validation
    • data split, training / Training and testing data split
    • data split, testing / Training and testing data split
    • models, summarizing / Summary of models
    • guidelines, for selecting variables / Summary of models
    • linear regression with scikit-learn / Linear regression with scikit-learn
    • feature selection, with scikit-learn / Feature selection with scikit-learn
  • Monte-Carlo simulation
    • for finding value of pi / Using the Monte-Carlo simulation to find the value of pi
  • multi-collinearity
    • about / Multi-collinearity

N

  • normal distribution
    • about / Normal distribution
  • null hypothesis
    • versus alternate hypothesis / Null versus alternate hypothesis
  • NumPy
    • about / Python and its packages for predictive modelling
    • URL / Python and its packages for predictive modelling

O

  • outliers
    • about / Handling outliers
    • handling / Handling outliers

P

  • p-values
    • about / p-values
  • pandas
    • about / Python and its packages for predictive modelling
    • URL / Python and its packages for predictive modelling
  • parameters, random forest
    • node size / Important parameters for random forests
    • number of trees / Important parameters for random forests
    • number of predictors sampled / Important parameters for random forests
  • pip
    • installing / Installing pip
  • predictive analytics
    • about / Introducing predictive modelling
  • predictive modelling
    • about / Introducing predictive modelling
    • scope / Scope of predictive modelling
    • statistical algorithms / Ensemble of statistical algorithms
    • statistical tools / Statistical tools
    • historical data / Historical data
    • mathematical function / Mathematical function
    • business context / Business context
    • knowledge matrix / Knowledge matrix for predictive modelling
    • task matrix / Task matrix for predictive modelling
    • applications and examples / Applications and examples of predictive modelling
  • predictor variables
    • about / Multiple linear regression
    • forward selection approach / Multiple linear regression
    • backward selection approach / Multiple linear regression
  • Probability Density Function
    • about / Probability density function
  • probability distributions
    • about / Generating random numbers following probability distributions
    • Probability Density Function / Probability density function
    • Cumulative Density Function / Cumulative density function
  • Python packages
    • about / Python and its packages – download and installation
    • Anaconda / Anaconda
    • Standalone Python / Standalone Python
    • installing / Installing a Python package
    • installing, with pip / Installing Python packages with pip
  • Python packages, for predictive modelling
    • about / Python and its packages for predictive modelling
    • pandas / Python and its packages for predictive modelling
    • NumPy / Python and its packages for predictive modelling
    • matplotlib / Python and its packages for predictive modelling
    • IPython / Python and its packages for predictive modelling
    • scikit-learn / Python and its packages for predictive modelling

R

  • random forest
    • implementing, using Python / Implementing a random forest using Python
    • features / Why do random forests work?
    • parameters / Important parameters for random forests
  • random forest algorithm
    • about / The random forest algorithm
  • random forests
    • about / Understanding and implementing random forests
  • random numbers
    • about / Generating random numbers and their usage
    • generating / Generating random numbers and their usage
    • usage / Generating random numbers and their usage
    • methods, for generating / Various methods for generating random numbers
    • seeding / Seeding a random number
    • generating, following probability distributions / Generating random numbers following probability distributions
  • random sampling
    • about / Random sampling – splitting a dataset in training and testing datasets
    • dataset, testing / Random sampling – splitting a dataset in training and testing datasets
    • dataset, splitting / Random sampling – splitting a dataset in training and testing datasets
    • Customer Churn Model, using / Method 1 – using the Customer Churn Model
    • sklearn, using / Method 2 – using sklearn
    • shuffle function, using / Method 3 – using the shuffle function
    • and central limit theorem / Random sampling and the central limit theorem
  • read_csv method
    • about / Case 1 – reading a dataset using the read_csv method, The read_csv method
    • filepath / The read_csv method
    • sep / The read_csv method
    • dtype / The read_csv method
    • header / The read_csv method
    • names / The read_csv method
    • skiprows / The read_csv method
    • index_col / The read_csv method
    • skip_blank_lines / The read_csv method
    • na-filter / The read_csv method
    • use cases / Use cases of the read_csv method
  • Receiver Operating Characteristic (ROC) curve
    • about / Model validation
  • Recursive Feature Elimination (RFE) / Feature selection with scikit-learn
  • regression tree algorithm
    • about / Regression tree algorithm
  • regression trees
    • about / Understanding and implementing regression trees
    • advantages / Regression tree algorithm
    • implementing, with Python / Implementing a regression tree using Python
  • Residual Standard Error (RSE)
    • about / Residual Standard Error
  • result parameters
    • about / Making sense of result parameters
    • p-values / p-values
    • F-statistics / F-statistics
    • Residual Standard Error (RSE) / Residual Standard Error
  • retrospective analytics
    • about / Introducing predictive modelling
  • right-tailed test
    • about / Different kinds of hypothesis test
  • Right Join
    • about / Right Join
    • characteristics / Right Join
    • example / An example of the Right Join
  • ROC curve
    • about / The ROC curve
    • confusion matrix / Confusion matrix

S

  • scatter plot
    • about / Scatter plots
    • plotting / Scatter plots
  • scikit-learn
    • about / Python and its packages for predictive modelling
    • features / Python and its packages for predictive modelling
    • URL / Python and its packages for predictive modelling
  • Sensitivity (True Positive Rate) / The ROC curve
  • shuffle function
    • using / Method 3 – using the shuffle function
  • Silhouette Coefficient / Silhouette Coefficient
  • sklearn
    • using / Method 2 – using sklearn
  • Specificity (True Negative Rate) / The ROC curve
  • Spyder
    • about / IDEs for Python
    • features / IDEs for Python
  • Standalone Python
    • about / Standalone Python
  • statistical algorithms, predictive modelling
    • about / Ensemble of statistical algorithms
    • supervised algorithms / Ensemble of statistical algorithms
    • un-supervised algorithms / Ensemble of statistical algorithms
  • statistics
    • best practices / Best practices for statistics

T

  • t-statistic
    • about / Z-statistic and t-statistic
  • t-test / Best practices for statistics
  • t-test (Student-t distribution)
    • about / Z-statistic and t-statistic
  • task matrix, predictive modelling
    • about / Task matrix for predictive modelling
  • two-tailed test
    • about / Different kinds of hypothesis test

U

  • uniform distribution
    • about / Uniform distribution
  • use cases, read_csv method
    • about / Use cases of the read_csv method
    • directory address and filename, passing as variables / Passing the directory address and filename as variables
    • .txt dataset, reading with comma delimiter / Reading a .txt dataset with a comma delimiter
    • dataset column names, specifying from list / Specifying the column names of a dataset from a list

V

  • value of pi
    • calculating / Geometry and mathematics behind the calculation of pi
  • Variance Inflation Factor (VIF)
    • about / Variance Inflation Factor

W

  • Wald test / Wald test

Z

  • Z-statistic
    • about / Z-statistic and t-statistic
  • Z-test / Best practices for statistics
  • Z- test (normal distribution)
    • about / Z-statistic and t-statistic
lock icon The rest of the chapter is locked
arrow left Previous Section
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime}