You're reading from Learning Predictive Analytics with Python Gain practical insights into predictive modelling by implementing Predictive Analytics algorithms on public datasets with Python

Product type Paperback

Published in Feb 2016

Publisher

ISBN-13 9781783983261

Length 354 pages

Edition 1st Edition

Languages

Python

Concepts

Predictive Analytics

Authors (2):

Ashish Kumar

Gary Dougan

View More author details

Table of Contents (12) Chapters

Preface

1. Getting Started with Predictive Modelling FREE CHAPTER

2. Data Cleaning

3. Data Wrangling

4. Statistical Concepts for Predictive Modelling

5. Linear Regression with Python

6. Logistic Regression with Python

7. Clustering with Python

8. Trees and Random Forests with Python

9. Best Practices for Predictive Modelling

A. A List of Links

Index

Python and its packages for predictive modelling

In this section, we will discuss some commonly used packages for predictive modelling.

pandas: The most important and versatile package that is used widely in data science domains is pandas and it is no wonder that you can see import pandas at the beginning of any data science code snippet, in this book, and anywhere in general. Among other things, the pandas package facilitates:

The reading of a dataset in a usable format (data frame in case of Python)
Calculating basic statistics
Running basic operations like sub-setting a dataset, merging/concatenating two datasets, handling missing data, and so on

The various methods in pandas will be explained in this book as and when we use them.

Note

To get an overview, navigate to the official page of pandas here: http://pandas.pydata.org/index.html

NumPy: NumPy, in many ways, is a MATLAB equivalent in the Python environment. It has powerful methods to do mathematical calculations and simulations. The following are some of its features:

A powerful and widely used a N-d array element
An ensemble of powerful mathematical functions used in linear algebra, Fourier transforms, and random number generation
A combination of random number generators and an N-d array elements is used to generate dummy datasets to demonstrate various procedures, a practice we will follow extensively, in this book

Note

To get an overview, navigate to official page of NumPy at http://www.NumPy.org/

matplotlib: matplotlib is a Python library that easily generates high-quality 2-D plots. Again, it is very similar to MATLAB.

It can be used to plot all kind of common plots, such as histograms, stacked and unstacked bar charts, scatterplots, heat diagrams, box plots, power spectra, error charts, and so on
It can be used to edit and manipulate all the plot properties such as title, axes properties, color, scale, and so on

Note

To get an overview, navigate to the official page of matplotlib at: http://matplotlib.org

IPython: IPython provides an environment for interactive computing.

It provides a browser-based notebook that is an IDE-cum-development environment to support codes, rich media, inline plots, and model summary. These notebooks and their content can be saved and used later to demonstrate the result as it is or to save the codes separately and execute them. It has emerged as a powerful tool for web based tutorials as the code and the results flow smoothly one after the other in this environment. At many places in this book, we will be using this environment.

Note

To get an overview, navigate to the official page of IPython here http://ipython.org/

Scikit-learn: scikit-learn is the mainstay of any predictive modelling in Python. It is a robust collection of all the data science algorithms and methods to implement them. Some of the features of scikit-learn are as follows:

It is built entirely on Python packages like pandas, NumPy, and matplotlib
It is very simple and efficient to use
It has methods to implement most of the predictive modelling techniques, such as linear regression, logistic regression, clustering, and Decision Trees
It gives a very concise method to predict the outcome based on the model and measure the accuracy of the outcomes

Note

To get an overview, navigate to the official page of scikit-learn here: http://scikit-learn.org/stable/index.html

Python packages, other than these, if used in this book, will be situation based and can be installed using the method described earlier in this section.

You're reading from Learning Predictive Analytics with Python Gain practical insights into predictive modelling by implementing Predictive Analytics algorithms on public datasets with Python

Table of Contents (12) Chapters

Python and its packages for predictive modelling

Note

Note

Note

Note

Note

Authors (2)

Personalised recommendations for you