Index
A
- AdaBoost / Sequences of models – AdaBoost
- Additive White Gaussian Noise (AWGN)
- about / Dimensionality reduction
- advanced nonlinear algorithms
- about / Advanced nonlinear algorithms
- SVM, used for classification / SVM for classification
- SVM, used for regression / SVM for regression
- SVM, tuning / Tuning SVM
- Anaconda
- about / Anaconda
- URL / Anaconda
- Arbitrary waveform generator (AWG)
- about / Latent Factor Analysis (LFA)
- Area under a curve (AUC) / Binary classification
- arrays
- resizing / Resizing arrays
- deriving, from NumPy functions / Arrays derived from NumPy functions
- obtaining, from file / Getting an array directly from a file
B
- bar graphs / Bar graphs
- Beautiful Soup
- URL / Beautiful Soup
- about / Beautiful Soup
- betweenness centrality / Graph algorithms
- big data
- dealing with / Dealing with big data
- datasets, creating as examples / Creating some big datasets as examples
- scalability, with volume / Scalability with volume
- velocity / Keeping up with velocity
- variety / Dealing with variety
- Stochastic Gradient Descent (SGD) / A quick overview of Stochastic Gradient Descent (SGD)
- big datasets
- dealing with / Dealing with big datasets
- binary classification / Binary classification
- boxplots / Boxplots and histograms
C
- categorical data
- working with / Working with categorical and textual data
- Chi2 object / Univariate selection
- closeness centrality / Graph algorithms
- collection of edges
- about / Introduction to graph theory
- covariance.EllipticEnvelope class
- about / Univariate outlier detection
- covariance matrix / The covariance matrix
- cross-validation
- about / Cross-validation
- working / Cross-validation
- iterators, using / Using cross-validation iterators
- sampling / Sampling and bootstrapping
- bootstrapping / Sampling and bootstrapping
- curve plotting / Curve plotting
D
- data
- loading, from CSV / Loading data directly from CSV or text files
- loading, from text files / Loading data directly from CSV or text files
- Scikit-learn sample generators / Scikit-learn sample generators
- preprocessing, with pandas / Data loading and preprocessing with pandas, Data preprocessing
- loading, with pandas / Fast and easy data loading
- processing, with NumPy / Data processing with NumPy
- extracting, from pandas / Extracting data from pandas
- data-text
- about / A special type of data – text
- data formats
- accessing / Accessing other data formats
- data learning representation
- about / Advanced data learning representation
- learning curve / Learning curves
- validation curves / Validation curves
- variables, selecting / Feature importance
- GBT partial dependence plot / GBT partial dependence plot
- data munging phase
- about / The data science process
- data repository
- URL / The MLdata.org public repository
- data science
- about / Introducing data science and Python
- data science process
- about / The data science process
- data selection
- about / Data selection
- datasets
- about / Datasets and code used in the book
- Scikit-learn toy datasets / Scikit-learn toy datasets
- Mldata.org public repository / The MLdata.org public repository
- LIBSVM data examples / LIBSVM data examples
- DBSCAN
- about / An overview of unsupervised learning
- degree centrality / Graph algorithms
- dimensionality reduction
- about / Dimensionality reduction
- covariance matrix / The covariance matrix
- PCA / Principal Component Analysis (PCA)
- PCA variation, for big data-randomized PCA / A variation of PCA for big data – RandomizedPCA
- LFA / Latent Factor Analysis (LFA)
- LDA / Linear Discriminant Analysis (LDA)
- LSA / Latent Semantical Analysis (LSA)
- ICA / Independent Component Analysis (ICA)
- kernel PCA / Kernel PCA
- RBM / Restricted Boltzmann Machine (RBM)
E
- EDA
- about / Introducing EDA
- eigenvector centrality / Graph algorithms
- EllipticEnvelope function / EllipticEnvelope
- ensemble strategies
- about / Ensemble strategies
- averaging algorithms / Ensemble strategies
- boosting algorithms / Ensemble strategies
- random samples, pasting by / Pasting by random samples
- weak ensembles, bagging with / Bagging with weak ensembles
- Random Subspaces / Random Subspaces and Random Patches
- Random Patches / Random Subspaces and Random Patches
- AdaBoost / Sequences of models – AdaBoost
- GTB / Gradient tree boosting (GTB)
- big data / Dealing with big data
- Enthought Canopy
- URL / Enthought Canopy
- about / Enthought Canopy
- Explorative Data Analysis (EDA)
- about / Selected graphical examples with pandas
F
- feature creation
- about / Feature creation
- feature selection
- about / Feature selection
- univariate selection / Univariate selection
- recursive elimination / Recursive elimination
- stability / Stability and L1-based selection
- L1-based selection / Stability and L1-based selection
- file
- arrays, obtaining from / Getting an array directly from a file
- functions
- scoring / Scoring functions
- multilabel classification / Multilabel classification
- binary classification / Binary classification
- regression / Regression
- f_classif object / Univariate selection
- f_regression object / Univariate selection
G
- GBT partial dependence plot / GBT partial dependence plot
- Gensim
- about / Gensim
- URL / Gensim
- Gephi
- URL / Graph loading, dumping, and sampling
- graph
- loading / Graph loading, dumping, and sampling
- dumping / Graph loading, dumping, and sampling
- sampling / Graph loading, dumping, and sampling
- graph algorithms
- about / Graph algorithms
- betweenness centrality / Graph algorithms
- degree centrality / Graph algorithms
- closeness centrality / Graph algorithms
- eigenvector centrality / Graph algorithms
- graphical examples, with pandas
- about / Selected graphical examples with pandas
- histograms / Boxplots and histograms
- boxplots / Boxplots and histograms
- scatterplots / Scatterplots
- parallel coordinates / Parallel coordinates
- Graph Modeling Language (GML)
- about / Graph loading, dumping, and sampling
- graph theory
- about / Introduction to graph theory
- GTB / Gradient tree boosting (GTB)
H
- hashing trick
- about / Dealing with variety
- heterogeneous lists, NumPy arrays / Heterogeneous lists
- histograms / Histograms, Boxplots and histograms
- hyper-parameters
- max_features / Random Subspaces and Random Patches
- min_samples_leaf / Random Subspaces and Random Patches
- bootstrap / Random Subspaces and Random Patches
- n_estimators / Random Subspaces and Random Patches
- hyper-parameters optimization
- about / Hyper-parameters' optimization
- custom scoring functions, building / Building custom scoring functions
- grid search runtime, reducing / Reducing the grid search runtime
I
- ICA / Independent Component Analysis (ICA)
- image visualization / Image visualization
- incremental learning
- about / Scalability with volume
- index
- about / Data selection
- indexing, with NumPy arrays / Slicing and indexing with NumPy arrays
- IPython
- URL / IPython
- about / Introducing IPython
- IPython Notebook
- about / The IPython Notebook
- iterators
- StratifiedKFold / Using cross-validation iterators
- LeaveOneOut / Using cross-validation iterators
- LeavePOut / Using cross-validation iterators
- LeaveOneLabelOut / Using cross-validation iterators
- LeavePLabelOut / Using cross-validation iterators
K
- k-Nearest Neighbors
- about / The k-Nearest Neighbors
- kernel PCA / Kernel PCA
L
- Latent Dirichlet Allocation (LDA)
- about / Gensim
- Latent Semantic Analysis (LSA)
- about / Gensim
- LDA / Linear Discriminant Analysis (LDA)
- learning curve / Learning curves
- LFA / Latent Factor Analysis (LFA)
- LIBSVM
- about / The MLdata.org public repository
- LIBSVM data examples
- about / LIBSVM data examples
- URL / LIBSVM data examples
- linear regression
- about / Linear and logistic regression
- lists
- transforming, to unidimensional arrays / From lists to unidimensional arrays
- transforming, to multidimensional arrays / From lists to multidimensional arrays
- logistic regression
- about / Linear and logistic regression
- LSA / Latent Semantical Analysis (LSA)
M
- mask
- about / Data preprocessing
- matplotlib
- about / Matplotlib, Introducing the basics of matplotlib
- URL / Matplotlib
- curve plotting / Curve plotting
- panels, using / Using panels
- scatterplots / Scatterplots
- histograms / Histograms
- bar graphs / Bar graphs
- image visualization / Image visualization
- matrix operations, NumPy / Matrix operations
- Mean Absolute Error (MAE)
- about / Feature creation
- Mldata.org public repository / The MLdata.org public repository
- multidimensional arrays
- lists, transforming to / From lists to multidimensional arrays
- multilabel classification / Multilabel classification
- about / Multilabel classification
- Confusion matrix / Multilabel classification
- Accuracy / Multilabel classification
- Precision / Multilabel classification
- Recall / Multilabel classification
- F1 Score / Multilabel classification
N
- 20newsgroup
- URL / A special type of data – text
- n-dimensional array
- about / NumPy's n-dimensional array
- Naive Bayes
- about / Naive Bayes
- Named Entity Recognition (NER) / Named Entity Recognition (NER)
- ndarray object class
- attributes / NumPy's n-dimensional array
- ndarray objects
- drawbacks / NumPy's n-dimensional array
- basics / The basics of NumPy ndarray objects
- NetworkX
- about / NetworkX
- URL / NetworkX
- NLP
- about / A peek into Natural Language Processing (NLP)
- word tokenization / Word tokenization
- stemming / Stemming
- Word Tagging / Word Tagging
- Named Entity Recognition (NER) / Named Entity Recognition (NER)
- stopwords / Stopwords
- text classification / A complete data science example – text classification
- NLTK
- about / NLTK, A peek into Natural Language Processing (NLP)
- URL / NLTK
- NumPy
- about / NumPy
- URL / NumPy
- data, processing with / Data processing with NumPy
- n-dimensional array / NumPy's n-dimensional array
- URL, for user guide / Controlling the memory size
- operations / NumPy fast operation and computations
- computations / NumPy fast operation and computations
- matrix operations / Matrix operations
- NumPy arrays
- creating / Creating NumPy arrays
- memory size, controlling / Controlling the memory size
- heterogeneous lists / Heterogeneous lists
- slicing with / Slicing and indexing with NumPy arrays
- indexing with / Slicing and indexing with NumPy arrays
- stacking / Stacking NumPy arrays
- NumPy functions
- arrays, deriving from / Arrays derived from NumPy functions
O
- OneClassSVM
- about / OneClassSVM
- kernel / OneClassSVM
- degree / OneClassSVM
- gamma / OneClassSVM
- nu / OneClassSVM
- outliers
- detecting / The detection and treatment of outliers
- treatment / The detection and treatment of outliers
- univariate outlier detection / Univariate outlier detection
- EllipticEnvelope function / EllipticEnvelope
- OneClassSVM / OneClassSVM
P
- pandas
- data, preprocessing with / Data loading and preprocessing with pandas, Data preprocessing
- data, loading with / Fast and easy data loading
- data, extracting from / Extracting data from pandas
- Pandas
- about / pandas
- URL / pandas
- panels
- using / Using panels
- parallel coordinates / Parallel coordinates
- parameters
- n_iter / A quick overview of Stochastic Gradient Descent (SGD)
- penalty / A quick overview of Stochastic Gradient Descent (SGD)
- alpha / A quick overview of Stochastic Gradient Descent (SGD)
- l1_ratio / A quick overview of Stochastic Gradient Descent (SGD)
- learning_rate / A quick overview of Stochastic Gradient Descent (SGD)
- epsilon / A quick overview of Stochastic Gradient Descent (SGD)
- shuffle / A quick overview of Stochastic Gradient Descent (SGD)
- PASCAL
- about / The MLdata.org public repository
- PCA / Principal Component Analysis (PCA)
- pip
- URL / The installation of packages
- problematic data
- dealing with / Dealing with problematic data
- PyPI
- URL / A glance at the essential Python packages
- PyPy
- about / PyPy
- Python
- about / Introducing data science and Python
- characteristics / Introducing data science and Python
- installing / Installing Python, Step-by-step installation
- 2 / Python 2 or Python 3?
- 3 / Python 2 or Python 3?
- Python 2 / Python 2 or Python 3?
- Python 3 / Python 2 or Python 3?
- Python packages
- about / A glance at the essential Python packages
- NumPy / NumPy
- SciPy / SciPy
- Pandas / pandas
- Scikit-learn / Scikit-learn
- IPython / IPython
- matplotlib / Matplotlib
- statsmodels / Statsmodels
- Beautiful Soup / Beautiful Soup
- NetworkX / NetworkX
- NLTK / NLTK
- Gensim / Gensim
- PyPy / PyPy
- installing / The installation of packages
- URL / The installation of packages
- upgrades / Package upgrades
- PythonXY
- about / PythonXY
- URL / PythonXY
R
- Radial Basis Function
- about / SVM for classification
- random forests
- about / The IPython Notebook
- Random Patches / Random Subspaces and Random Patches
- Random Subspaces / Random Subspaces and Random Patches
- RBF kernel
- about / SVM for classification
- RBM / Restricted Boltzmann Machine (RBM)
- Receiver Operating Characteristics curve (ROC) / Binary classification
- recursive elimination / Recursive elimination
- regression
- about / Regression
- MAE / Regression
- MSE / Regression
- R2 score / Regression
S
- scatterplots / Scatterplots, Scatterplots
- scientific distributions
- about / Scientific distributions
- Anaconda / Anaconda
- Enthought Canopy / Enthought Canopy
- PythonXY / PythonXY
- WinPython / WinPython
- Scikit-learn
- about / Scikit-learn
- URL / Scikit-learn
- Scikit-learn sample generators / Scikit-learn sample generators
- Scikit-learn toy datasets
- about / Scikit-learn toy datasets
- methods / Scikit-learn toy datasets
- Scikit-learn website
- URL / Using cross-validation iterators
- SciPy
- about / SciPy
- URL / SciPy
- SGDClassifier
- about / A quick overview of Stochastic Gradient Descent (SGD)
- SGDRegressor
- about / A quick overview of Stochastic Gradient Descent (SGD)
- Silhouette Coefficient
- URL / An overview of unsupervised learning
- Singular Value Decomposition (SVD)
- about / A variation of PCA for big data – RandomizedPCA
- slicing, with NumPy arrays / Slicing and indexing with NumPy arrays
- statsmodels
- about / Statsmodels
- URL / Statsmodels
- stemming / Stemming
- Stochastic Gradient Descent (SGD) / A quick overview of Stochastic Gradient Descent (SGD)
- stopwords / Stopwords
- Support Vector Machine (SVM)
- about / The IPython Notebook
- SVM
- about / Advanced nonlinear algorithms
- tuning / Tuning SVM
- parameters / Tuning SVM
- svm.OneClassSVM class
- about / Univariate outlier detection
T
- testing
- about / Testing and validating
- text
- about / A special type of data – text
- text classification / A complete data science example – text classification
- textual data
- working with / Working with categorical and textual data
U
- UCI repository
- URL / SVM for classification
- unidimensional arrays
- lists, transforming to / From lists to unidimensional arrays
- univariate selection / Univariate selection
- unsupervised learning
- about / An overview of unsupervised learning
V
- validating
- about / Testing and validating
- validation curves / Validation curves
- variables
- selecting / Feature importance
- variety
- about / Dealing with big data
- velocity
- about / Dealing with big data
- veracity
- about / Dealing with big data
- visualization rules
- URL / Introducing the basics of matplotlib
- volume
- about / Dealing with big data
W
- WinPython
- URL / WinPython
- about / WinPython
- WinPython Package Manager (WPPM)
- about / WinPython
- Word Tagging / Word Tagging
- word tokenization / Word tokenization