Packt+ | Advance your knowledge in tech

You're reading from Python Data Science Essentials

Product type Book

Published in Apr 2015

Publisher Packt

ISBN-13 9781785280429

Pages 258 pages

Edition 1st Edition

Languages

Python

Concepts

Data Analysis

Table of Contents (13) Chapters

Python Data Science Essentials

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Preface

1. First Steps

2. Data Munging

3. The Data Science Pipeline

4. Machine Learning

5. Social Network Analysis

6. Visualization

Index

A

AdaBoost / Sequences of models – AdaBoost
Additive White Gaussian Noise (AWGN)
- about / Dimensionality reduction
advanced nonlinear algorithms
- about / Advanced nonlinear algorithms
- SVM, used for classification / SVM for classification
- SVM, used for regression / SVM for regression
- SVM, tuning / Tuning SVM
Anaconda
- about / Anaconda
- URL / Anaconda
Arbitrary waveform generator (AWG)
- about / Latent Factor Analysis (LFA)
Area under a curve (AUC) / Binary classification
arrays
- resizing / Resizing arrays
- deriving, from NumPy functions / Arrays derived from NumPy functions
- obtaining, from file / Getting an array directly from a file

B

bar graphs / Bar graphs
Beautiful Soup
- URL / Beautiful Soup
- about / Beautiful Soup
betweenness centrality / Graph algorithms
big data
- dealing with / Dealing with big data
- datasets, creating as examples / Creating some big datasets as examples
- scalability, with volume / Scalability with volume
- velocity / Keeping up with velocity
- variety / Dealing with variety
- Stochastic Gradient Descent (SGD) / A quick overview of Stochastic Gradient Descent (SGD)
big datasets
- dealing with / Dealing with big datasets
binary classification / Binary classification
boxplots / Boxplots and histograms

C

categorical data
- working with / Working with categorical and textual data
Chi2 object / Univariate selection
closeness centrality / Graph algorithms
collection of edges
- about / Introduction to graph theory
covariance.EllipticEnvelope class
- about / Univariate outlier detection
covariance matrix / The covariance matrix
cross-validation
- about / Cross-validation
- working / Cross-validation
- iterators, using / Using cross-validation iterators
- sampling / Sampling and bootstrapping
- bootstrapping / Sampling and bootstrapping
curve plotting / Curve plotting

D

data
- loading, from CSV / Loading data directly from CSV or text files
- loading, from text files / Loading data directly from CSV or text files
- Scikit-learn sample generators / Scikit-learn sample generators
- preprocessing, with pandas / Data loading and preprocessing with pandas, Data preprocessing
- loading, with pandas / Fast and easy data loading
- processing, with NumPy / Data processing with NumPy
- extracting, from pandas / Extracting data from pandas
data-text
- about / A special type of data – text
data formats
- accessing / Accessing other data formats
data learning representation
- about / Advanced data learning representation
- learning curve / Learning curves
- validation curves / Validation curves
- variables, selecting / Feature importance
- GBT partial dependence plot / GBT partial dependence plot
data munging phase
- about / The data science process
data repository
- URL / The MLdata.org public repository
data science
- about / Introducing data science and Python
data science process
- about / The data science process
data selection
- about / Data selection
datasets
- about / Datasets and code used in the book
- Scikit-learn toy datasets / Scikit-learn toy datasets
- Mldata.org public repository / The MLdata.org public repository
- LIBSVM data examples / LIBSVM data examples
DBSCAN
- about / An overview of unsupervised learning
degree centrality / Graph algorithms
dimensionality reduction
- about / Dimensionality reduction
- covariance matrix / The covariance matrix
- PCA / Principal Component Analysis (PCA)
- PCA variation, for big data-randomized PCA / A variation of PCA for big data – RandomizedPCA
- LFA / Latent Factor Analysis (LFA)
- LDA / Linear Discriminant Analysis (LDA)
- LSA / Latent Semantical Analysis (LSA)
- ICA / Independent Component Analysis (ICA)
- kernel PCA / Kernel PCA
- RBM / Restricted Boltzmann Machine (RBM)

E

EDA
- about / Introducing EDA
eigenvector centrality / Graph algorithms
EllipticEnvelope function / EllipticEnvelope
ensemble strategies
- about / Ensemble strategies
- averaging algorithms / Ensemble strategies
- boosting algorithms / Ensemble strategies
- random samples, pasting by / Pasting by random samples
- weak ensembles, bagging with / Bagging with weak ensembles
- Random Subspaces / Random Subspaces and Random Patches
- Random Patches / Random Subspaces and Random Patches
- AdaBoost / Sequences of models – AdaBoost
- GTB / Gradient tree boosting (GTB)
- big data / Dealing with big data
Enthought Canopy
- URL / Enthought Canopy
- about / Enthought Canopy
Explorative Data Analysis (EDA)
- about / Selected graphical examples with pandas

F

feature creation
- about / Feature creation
feature selection
- about / Feature selection
- univariate selection / Univariate selection
- recursive elimination / Recursive elimination
- stability / Stability and L1-based selection
- L1-based selection / Stability and L1-based selection
file
- arrays, obtaining from / Getting an array directly from a file
functions
- scoring / Scoring functions
- multilabel classification / Multilabel classification
- binary classification / Binary classification
- regression / Regression
f_classif object / Univariate selection
f_regression object / Univariate selection

G

GBT partial dependence plot / GBT partial dependence plot
Gensim
- about / Gensim
- URL / Gensim
Gephi
- URL / Graph loading, dumping, and sampling
graph
- loading / Graph loading, dumping, and sampling
- dumping / Graph loading, dumping, and sampling
- sampling / Graph loading, dumping, and sampling
graph algorithms
- about / Graph algorithms
- betweenness centrality / Graph algorithms
- degree centrality / Graph algorithms
- closeness centrality / Graph algorithms
- eigenvector centrality / Graph algorithms
graphical examples, with pandas
- about / Selected graphical examples with pandas
- histograms / Boxplots and histograms
- boxplots / Boxplots and histograms
- scatterplots / Scatterplots
- parallel coordinates / Parallel coordinates
Graph Modeling Language (GML)
- about / Graph loading, dumping, and sampling
graph theory
- about / Introduction to graph theory
GTB / Gradient tree boosting (GTB)

H

hashing trick
- about / Dealing with variety
heterogeneous lists, NumPy arrays / Heterogeneous lists
histograms / Histograms, Boxplots and histograms
hyper-parameters
- max_features / Random Subspaces and Random Patches
- min_samples_leaf / Random Subspaces and Random Patches
- bootstrap / Random Subspaces and Random Patches
- n_estimators / Random Subspaces and Random Patches
hyper-parameters optimization
- about / Hyper-parameters' optimization
- custom scoring functions, building / Building custom scoring functions
- grid search runtime, reducing / Reducing the grid search runtime

I

ICA / Independent Component Analysis (ICA)
image visualization / Image visualization
incremental learning
- about / Scalability with volume
index
- about / Data selection
indexing, with NumPy arrays / Slicing and indexing with NumPy arrays
IPython
- URL / IPython
- about / Introducing IPython
IPython Notebook
- about / The IPython Notebook
iterators
- StratifiedKFold / Using cross-validation iterators
- LeaveOneOut / Using cross-validation iterators
- LeavePOut / Using cross-validation iterators
- LeaveOneLabelOut / Using cross-validation iterators
- LeavePLabelOut / Using cross-validation iterators

K

k-Nearest Neighbors
- about / The k-Nearest Neighbors
kernel PCA / Kernel PCA

L

Latent Dirichlet Allocation (LDA)
- about / Gensim
Latent Semantic Analysis (LSA)
- about / Gensim
LDA / Linear Discriminant Analysis (LDA)
learning curve / Learning curves
LFA / Latent Factor Analysis (LFA)
LIBSVM
- about / The MLdata.org public repository
LIBSVM data examples
- about / LIBSVM data examples
- URL / LIBSVM data examples
linear regression
- about / Linear and logistic regression
lists
- transforming, to unidimensional arrays / From lists to unidimensional arrays
- transforming, to multidimensional arrays / From lists to multidimensional arrays
logistic regression
- about / Linear and logistic regression
LSA / Latent Semantical Analysis (LSA)

M

mask
- about / Data preprocessing
matplotlib
- about / Matplotlib, Introducing the basics of matplotlib
- URL / Matplotlib
- curve plotting / Curve plotting
- panels, using / Using panels
- scatterplots / Scatterplots
- histograms / Histograms
- bar graphs / Bar graphs
- image visualization / Image visualization
matrix operations, NumPy / Matrix operations
Mean Absolute Error (MAE)
- about / Feature creation
Mldata.org public repository / The MLdata.org public repository
multidimensional arrays
- lists, transforming to / From lists to multidimensional arrays
multilabel classification / Multilabel classification
- about / Multilabel classification
- Confusion matrix / Multilabel classification
- Accuracy / Multilabel classification
- Precision / Multilabel classification
- Recall / Multilabel classification
- F1 Score / Multilabel classification

N

20newsgroup
- URL / A special type of data – text
n-dimensional array
- about / NumPy's n-dimensional array
Naive Bayes
- about / Naive Bayes
Named Entity Recognition (NER) / Named Entity Recognition (NER)
ndarray object class
- attributes / NumPy's n-dimensional array
ndarray objects
- drawbacks / NumPy's n-dimensional array
- basics / The basics of NumPy ndarray objects
NetworkX
- about / NetworkX
- URL / NetworkX
NLP
- about / A peek into Natural Language Processing (NLP)
- word tokenization / Word tokenization
- stemming / Stemming
- Word Tagging / Word Tagging
- Named Entity Recognition (NER) / Named Entity Recognition (NER)
- stopwords / Stopwords
- text classification / A complete data science example – text classification
NLTK
- about / NLTK, A peek into Natural Language Processing (NLP)
- URL / NLTK
NumPy
- about / NumPy
- URL / NumPy
- data, processing with / Data processing with NumPy
- n-dimensional array / NumPy's n-dimensional array
- URL, for user guide / Controlling the memory size
- operations / NumPy fast operation and computations
- computations / NumPy fast operation and computations
- matrix operations / Matrix operations
NumPy arrays
- creating / Creating NumPy arrays
- memory size, controlling / Controlling the memory size
- heterogeneous lists / Heterogeneous lists
- slicing with / Slicing and indexing with NumPy arrays
- indexing with / Slicing and indexing with NumPy arrays
- stacking / Stacking NumPy arrays
NumPy functions
- arrays, deriving from / Arrays derived from NumPy functions

O

OneClassSVM
- about / OneClassSVM
- kernel / OneClassSVM
- degree / OneClassSVM
- gamma / OneClassSVM
- nu / OneClassSVM
outliers
- detecting / The detection and treatment of outliers
- treatment / The detection and treatment of outliers
- univariate outlier detection / Univariate outlier detection
- EllipticEnvelope function / EllipticEnvelope
- OneClassSVM / OneClassSVM

P

pandas
- data, preprocessing with / Data loading and preprocessing with pandas, Data preprocessing
- data, loading with / Fast and easy data loading
- data, extracting from / Extracting data from pandas
Pandas
- about / pandas
- URL / pandas
panels
- using / Using panels
parallel coordinates / Parallel coordinates
parameters
- n_iter / A quick overview of Stochastic Gradient Descent (SGD)
- penalty / A quick overview of Stochastic Gradient Descent (SGD)
- alpha / A quick overview of Stochastic Gradient Descent (SGD)
- l1_ratio / A quick overview of Stochastic Gradient Descent (SGD)
- learning_rate / A quick overview of Stochastic Gradient Descent (SGD)
- epsilon / A quick overview of Stochastic Gradient Descent (SGD)
- shuffle / A quick overview of Stochastic Gradient Descent (SGD)
PASCAL
- about / The MLdata.org public repository
PCA / Principal Component Analysis (PCA)
pip
- URL / The installation of packages
problematic data
- dealing with / Dealing with problematic data
PyPI
- URL / A glance at the essential Python packages
PyPy
- about / PyPy
Python
- about / Introducing data science and Python
- characteristics / Introducing data science and Python
- installing / Installing Python, Step-by-step installation
- 2 / Python 2 or Python 3?
- 3 / Python 2 or Python 3?
Python 2 / Python 2 or Python 3?
Python 3 / Python 2 or Python 3?
Python packages
- about / A glance at the essential Python packages
- NumPy / NumPy
- SciPy / SciPy
- Pandas / pandas
- Scikit-learn / Scikit-learn
- IPython / IPython
- matplotlib / Matplotlib
- statsmodels / Statsmodels
- Beautiful Soup / Beautiful Soup
- NetworkX / NetworkX
- NLTK / NLTK
- Gensim / Gensim
- PyPy / PyPy
- installing / The installation of packages
- URL / The installation of packages
- upgrades / Package upgrades
PythonXY
- about / PythonXY
- URL / PythonXY

R

Radial Basis Function
- about / SVM for classification
random forests
- about / The IPython Notebook
Random Patches / Random Subspaces and Random Patches
Random Subspaces / Random Subspaces and Random Patches
RBF kernel
- about / SVM for classification
RBM / Restricted Boltzmann Machine (RBM)
Receiver Operating Characteristics curve (ROC) / Binary classification
recursive elimination / Recursive elimination
regression
- about / Regression
- MAE / Regression
- MSE / Regression
- R2 score / Regression

S

scatterplots / Scatterplots, Scatterplots
scientific distributions
- about / Scientific distributions
- Anaconda / Anaconda
- Enthought Canopy / Enthought Canopy
- PythonXY / PythonXY
- WinPython / WinPython
Scikit-learn
- about / Scikit-learn
- URL / Scikit-learn
Scikit-learn sample generators / Scikit-learn sample generators
Scikit-learn toy datasets
- about / Scikit-learn toy datasets
- methods / Scikit-learn toy datasets
Scikit-learn website
- URL / Using cross-validation iterators
SciPy
- about / SciPy
- URL / SciPy
SGDClassifier
- about / A quick overview of Stochastic Gradient Descent (SGD)
SGDRegressor
- about / A quick overview of Stochastic Gradient Descent (SGD)
Silhouette Coefficient
- URL / An overview of unsupervised learning
Singular Value Decomposition (SVD)
- about / A variation of PCA for big data – RandomizedPCA
slicing, with NumPy arrays / Slicing and indexing with NumPy arrays
statsmodels
- about / Statsmodels
- URL / Statsmodels
stemming / Stemming
Stochastic Gradient Descent (SGD) / A quick overview of Stochastic Gradient Descent (SGD)
stopwords / Stopwords
Support Vector Machine (SVM)
- about / The IPython Notebook
SVM
- about / Advanced nonlinear algorithms
- tuning / Tuning SVM
- parameters / Tuning SVM
svm.OneClassSVM class
- about / Univariate outlier detection

T

testing
- about / Testing and validating
text
- about / A special type of data – text
text classification / A complete data science example – text classification
textual data
- working with / Working with categorical and textual data

U

UCI repository
- URL / SVM for classification
unidimensional arrays
- lists, transforming to / From lists to unidimensional arrays
univariate selection / Univariate selection
unsupervised learning
- about / An overview of unsupervised learning