Packt+ | Advance your knowledge in tech

You're reading from Python: Real-World Data Science Real-World Data Science

Product type Course

Published in Jun 2016

Publisher

ISBN-13 9781786465160

Length 1255 pages

Edition 1st Edition

Languages

Python

Concepts

Machine Learning

Authors (5):

Fabrizio Romano

Phuong Vo.T.H

Robert Layton

Sebastian Raschka

Martin Czygan

+1 more

View More author details

Table of Contents (12) Chapters

Table of Contents FREE CHAPTER

Python: Real-World Data Science

Meet Your Course Guide

What's so cool about Data Science?

Course Structure

Course Journey

The Course Roadmap and Timeline

1. Course Module 1: Python Fundamentals

2. Course Module 2: Data Analysis

3. Course Module 3: Data Mining

4. Course Module 4: Machine Learning

Index

Python: Real-World Data Science

Meet Your Course Guide

What's so cool about Data Science?

Course Structure

Course Journey

The Course Roadmap and Timeline

1. Course Module 1: Python Fundamentals

1. Introduction and First Steps – Take a Deep Breath

A proper introduction

Enter the Python

About Python

Portability

Coherence

Developer productivity

An extensive library

Software quality

Software integration

Satisfaction and enjoyment

What are the drawbacks?

Who is using Python today?

Setting up the environment

Python 2 versus Python 3 – the great debate

What you need for this course

Installing Python

Installing IPython

Installing additional packages

How you can run a Python program

Running Python scripts

Running the Python interactive shell

Running Python as a service

Running Python as a GUI application

How is Python code organized

How do we use modules and packages

Python's execution model

Names and namespaces

Scopes

Guidelines on how to write good code

The Python culture

A note on the IDEs

2. Object-oriented Design

Introducing object-oriented

Objects and classes

Specifying attributes and behaviors

Data describes objects

Behaviors are actions

Hiding details and creating the public interface

Composition

Inheritance

Inheritance provides abstraction

Multiple inheritance

Case study

3. Objects in Python

Creating Python classes

Adding attributes

Making it do something

Talking to yourself

More arguments

Initializing the object

Explaining yourself

Modules and packages

Organizing the modules

Absolute imports

Relative imports

Organizing module contents

Who can access my data?

Third-party libraries

Case study

4. When Objects Are Alike

Basic inheritance

Extending built-ins

Overriding and super

Multiple inheritance

The diamond problem

Different sets of arguments

Polymorphism

Abstract base classes

Using an abstract base class

Creating an abstract base class

Demystifying the magic

Case study

5. Expecting the Unexpected

Raising exceptions

Raising an exception

The effects of an exception

Handling exceptions

The exception hierarchy

Defining our own exceptions

Case study

6. When to Use Object-oriented Programming

Treat objects as objects

Adding behavior to class data with properties

Properties in detail

Decorators – another way to create properties

Deciding when to use properties

Manager objects

Removing duplicate code

In practice

Case study

7. Python Data Structures

Empty objects

Tuples and named tuples

Named tuples

Dictionaries

Dictionary use cases

Using defaultdict

Counter

Lists

Sorting lists

Sets

Extending built-ins

Queues

FIFO queues

LIFO queues

Priority queues

Case study

8. Python Object-oriented Shortcuts

Python built-in functions

The len() function

Reversed

Enumerate

File I/O

Placing it in context

An alternative to method overloading

Default arguments

Variable argument lists

Unpacking arguments

Functions are objects too

Using functions as attributes

Callable objects

Case study

9. Strings and Serialization

Strings

String manipulation

String formatting

Escaping braces

Keyword arguments

Container lookups

Object lookups

Making it look right

Strings are Unicode

Converting bytes to text

Converting text to bytes

Mutable byte strings

Regular expressions

Matching patterns

Matching a selection of characters

Escaping characters

Matching multiple characters

Grouping patterns together

Getting information from regular expressions

Making repeated regular expressions efficient

Serializing objects

Customizing pickles

Serializing web objects

Case study

10. The Iterator Pattern

Design patterns in brief

Iterators

The iterator protocol

Comprehensions

List comprehensions

Set and dictionary comprehensions

Generator expressions

Generators

Yield items from another iterable

Coroutines

Back to log parsing

Closing coroutines and throwing exceptions

The relationship between coroutines, generators, and functions

Case study

11. Python Design Patterns I

The decorator pattern

A decorator example

Decorators in Python

The observer pattern

An observer example

The strategy pattern

A strategy example

Strategy in Python

The state pattern

A state example

State versus strategy

State transition as coroutines

The singleton pattern

Singleton implementation

The template pattern

A template example

12. Python Design Patterns II

The adapter pattern

The facade pattern

The flyweight pattern

The command pattern

The abstract factory pattern

The composite pattern

13. Testing Object-oriented Programs

Why test?

Test-driven development

Unit testing

Assertion methods

Reducing boilerplate and cleaning up

Organizing and running tests

Ignoring broken tests

Testing with py.test

One way to do setup and cleanup

A completely different way to set up variables

Skipping tests with py.test

Imitating expensive objects

How much testing is enough?

Case study

Implementing it

14. Concurrency

Threads

The many problems with threads

Shared memory

The global interpreter lock

Thread overhead

Multiprocessing

Multiprocessing pools

Queues

The problems with multiprocessing

Futures

AsyncIO

AsyncIO in action

Reading an AsyncIO future

AsyncIO for networking

Using executors to wrap blocking code

Streams

Executors

Case study

2. Course Module 2: Data Analysis

1. Introducing Data Analysis and Libraries

Data analysis and processing

An overview of the libraries in data analysis

Python libraries in data analysis

NumPy

pandas

Matplotlib

PyMongo

The scikit-learn library

2. NumPy Arrays and Vectorized Computation

NumPy arrays

Data types

Array creation

Indexing and slicing

Fancy indexing

Numerical operations on arrays

Array functions

Data processing using arrays

Loading and saving data

Saving an array

Loading an array

Linear algebra with NumPy

NumPy random numbers

3. Data Analysis with pandas

An overview of the pandas package

The pandas data structure

Series

The DataFrame

The essential basic functionality

Reindexing and altering labels

Head and tail

Binary operations

Functional statistics

Function application

Sorting

Indexing and selecting data

Computational tools

Working with missing data

Advanced uses of pandas for data analysis

Hierarchical indexing

The Panel data

4. Data Visualization

The matplotlib API primer

Line properties

Figures and subplots

Exploring plot types

Scatter plots

Bar plots

Contour plots

Histogram plots

Legends and annotations

Plotting functions with pandas

Additional Python data visualization tools

Bokeh

MayaVi

5. Time Series

Time series primer

Working with date and time objects

Resampling time series

Downsampling time series data

Upsampling time series data

Timedeltas

Time series plotting

6. Interacting with Databases

Interacting with data in text format

Reading data from text format

Writing data to text format

Interacting with data in binary format

HDF5

Interacting with data in MongoDB

Interacting with data in Redis

The simple value

List

Set

Ordered set

7. Data Analysis Application Examples

Data munging

Cleaning data

Filtering

Merging data

Reshaping data

Data aggregation

Grouping data

3. Course Module 3: Data Mining

1. Getting Started with Data Mining

Introducing data mining

A simple affinity analysis example

What is affinity analysis?

Product recommendations

Loading the dataset with NumPy

Implementing a simple ranking of rules

Ranking to find the best rules

A simple classification example

What is classification?

Loading and preparing the dataset

Implementing the OneR algorithm

Testing the algorithm

2. Classifying with scikit-learn Estimators

scikit-learn estimators

Nearest neighbors

Distance metrics

Loading the dataset

Moving towards a standard workflow

Running the algorithm

Setting parameters

Preprocessing using pipelines

An example

Standard preprocessing

Putting it all together

Pipelines

3. Predicting Sports Winners with Decision Trees

Loading the dataset

Collecting the data

Using pandas to load the dataset

Cleaning up the dataset

Extracting new features

Decision trees

Parameters in decision trees

Using decision trees

Sports outcome prediction

Putting it all together

Random forests

How do ensembles work?

Parameters in Random forests

Applying Random forests

Engineering new features

4. Recommending Movies Using Affinity Analysis

Affinity analysis

Algorithms for affinity analysis

Choosing parameters

The movie recommendation problem

Obtaining the dataset

Loading with pandas

Sparse data formats

The Apriori implementation

The Apriori algorithm

Implementation

Extracting association rules

Evaluation

5. Extracting Features with Transformers

Feature extraction

Representing reality in models

Common feature patterns

Creating good features

Feature selection

Selecting the best individual features

Feature creation

Creating your own transformer

The transformer API

Implementation details

Unit testing

Putting it all together

6. Social Media Insight Using Naive Bayes

Disambiguation

Downloading data from a social network

Loading and classifying the dataset

Creating a replicable dataset from Twitter

Text transformers

Bag-of-words

N-grams

Other features

Naive Bayes

Bayes' theorem

Naive Bayes algorithm

How it works

Application

Extracting word counts

Converting dictionaries to a matrix

Training the Naive Bayes classifier

Putting it all together

Evaluation using the F1-score

Getting useful features from models

7. Discovering Accounts to Follow Using Graph Mining

Loading the dataset

Classifying with an existing model

Getting follower information from Twitter

Building the network

Creating a graph

Creating a similarity graph

Finding subgraphs

Connected components

Optimizing criteria

8. Beating CAPTCHAs with Neural Networks

Artificial neural networks

An introduction to neural networks

Creating the dataset

Drawing basic CAPTCHAs

Splitting the image into individual letters

Creating a training dataset

Adjusting our training dataset to our methodology

Training and classifying

Back propagation

Predicting words

Improving accuracy using a dictionary

Ranking mechanisms for words

Putting it all together

9. Authorship Attribution

Attributing documents to authors

Applications and use cases

Attributing authorship

Getting the data

Function words

Counting function words

Classifying with function words

Support vector machines

Classifying with SVMs

Kernels

Character n-grams

Extracting character n-grams

Using the Enron dataset

Accessing the Enron dataset

Creating a dataset loader

Putting it all together

Evaluation

10. Clustering News Articles

Obtaining news articles

Using a Web API to get data

Reddit as a data source

Getting the data

Extracting text from arbitrary websites

Finding the stories in arbitrary websites

Putting it all together

Grouping news articles

The k-means algorithm

Evaluating the results

Extracting topic information from clusters

Using clustering algorithms as transformers

Clustering ensembles

Evidence accumulation

How it works

Implementation

Online learning

An introduction to online learning

Implementation

11. Classifying Objects in Images Using Deep Learning

Object classification

Application scenario and goals

Use cases

Deep neural networks

Intuition

Implementation

An introduction to Theano

An introduction to Lasagne

Implementing neural networks with nolearn

GPU optimization

When to use GPUs for computation

Running our code on a GPU

Setting up the environment

Application

Getting the data

Creating the neural network

Putting it all together

12. Working with Big Data

Big data

Application scenario and goals

MapReduce

Intuition

A word count example

Hadoop MapReduce

Application

Getting the data

Naive Bayes prediction

The mrjob package

Extracting the blog posts

Training Naive Bayes

Putting it all together

Training on Amazon's EMR infrastructure

13. Next Steps…

Chapter 1 – Getting Started with Data Mining

Scikit-learn tutorials

Extending the IPython Notebook

Chapter 2 – Classifying with scikit-learn Estimators

More complex pipelines

Comparing classifiers

Chapter 3: Predicting Sports Winners with Decision Trees

You're reading from Python: Real-World Data Science Real-World Data Science

Table of Contents (12) Chapters

Table of Contents

Authors (6)

Personalised recommendations for you