Search icon CANCEL
Subscription
0
Cart icon
Cart
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Python Data Science Essentials

You're reading from  Python Data Science Essentials

Product type Book
Published in Apr 2015
Publisher Packt
ISBN-13 9781785280429
Pages 258 pages
Edition 1st Edition
Languages
Toc

Index

A

  • AdaBoost / Sequences of models – AdaBoost
  • Additive White Gaussian Noise (AWGN)
    • about / Dimensionality reduction
  • advanced nonlinear algorithms
    • about / Advanced nonlinear algorithms
    • SVM, used for classification / SVM for classification
    • SVM, used for regression / SVM for regression
    • SVM, tuning / Tuning SVM
  • Anaconda
    • about / Anaconda
    • URL / Anaconda
  • Arbitrary waveform generator (AWG)
    • about / Latent Factor Analysis (LFA)
  • Area under a curve (AUC) / Binary classification
  • arrays
    • resizing / Resizing arrays
    • deriving, from NumPy functions / Arrays derived from NumPy functions
    • obtaining, from file / Getting an array directly from a file

B

  • bar graphs / Bar graphs
  • Beautiful Soup
    • URL / Beautiful Soup
    • about / Beautiful Soup
  • betweenness centrality / Graph algorithms
  • big data
    • dealing with / Dealing with big data
    • datasets, creating as examples / Creating some big datasets as examples
    • scalability, with volume / Scalability with volume
    • velocity / Keeping up with velocity
    • variety / Dealing with variety
    • Stochastic Gradient Descent (SGD) / A quick overview of Stochastic Gradient Descent (SGD)
  • big datasets
    • dealing with / Dealing with big datasets
  • binary classification / Binary classification
  • boxplots / Boxplots and histograms

C

  • categorical data
    • working with / Working with categorical and textual data
  • Chi2 object / Univariate selection
  • closeness centrality / Graph algorithms
  • collection of edges
    • about / Introduction to graph theory
  • covariance.EllipticEnvelope class
    • about / Univariate outlier detection
  • covariance matrix / The covariance matrix
  • cross-validation
    • about / Cross-validation
    • working / Cross-validation
    • iterators, using / Using cross-validation iterators
    • sampling / Sampling and bootstrapping
    • bootstrapping / Sampling and bootstrapping
  • curve plotting / Curve plotting

D

  • data
    • loading, from CSV / Loading data directly from CSV or text files
    • loading, from text files / Loading data directly from CSV or text files
    • Scikit-learn sample generators / Scikit-learn sample generators
    • preprocessing, with pandas / Data loading and preprocessing with pandas, Data preprocessing
    • loading, with pandas / Fast and easy data loading
    • processing, with NumPy / Data processing with NumPy
    • extracting, from pandas / Extracting data from pandas
  • data-text
    • about / A special type of data – text
  • data formats
    • accessing / Accessing other data formats
  • data learning representation
    • about / Advanced data learning representation
    • learning curve / Learning curves
    • validation curves / Validation curves
    • variables, selecting / Feature importance
    • GBT partial dependence plot / GBT partial dependence plot
  • data munging phase
    • about / The data science process
  • data repository
    • URL / The MLdata.org public repository
  • data science
    • about / Introducing data science and Python
  • data science process
    • about / The data science process
  • data selection
    • about / Data selection
  • datasets
    • about / Datasets and code used in the book
    • Scikit-learn toy datasets / Scikit-learn toy datasets
    • Mldata.org public repository / The MLdata.org public repository
    • LIBSVM data examples / LIBSVM data examples
  • DBSCAN
    • about / An overview of unsupervised learning
  • degree centrality / Graph algorithms
  • dimensionality reduction
    • about / Dimensionality reduction
    • covariance matrix / The covariance matrix
    • PCA / Principal Component Analysis (PCA)
    • PCA variation, for big data-randomized PCA / A variation of PCA for big data – RandomizedPCA
    • LFA / Latent Factor Analysis (LFA)
    • LDA / Linear Discriminant Analysis (LDA)
    • LSA / Latent Semantical Analysis (LSA)
    • ICA / Independent Component Analysis (ICA)
    • kernel PCA / Kernel PCA
    • RBM / Restricted Boltzmann Machine (RBM)

E

  • EDA
    • about / Introducing EDA
  • eigenvector centrality / Graph algorithms
  • EllipticEnvelope function / EllipticEnvelope
  • ensemble strategies
    • about / Ensemble strategies
    • averaging algorithms / Ensemble strategies
    • boosting algorithms / Ensemble strategies
    • random samples, pasting by / Pasting by random samples
    • weak ensembles, bagging with / Bagging with weak ensembles
    • Random Subspaces / Random Subspaces and Random Patches
    • Random Patches / Random Subspaces and Random Patches
    • AdaBoost / Sequences of models – AdaBoost
    • GTB / Gradient tree boosting (GTB)
    • big data / Dealing with big data
  • Enthought Canopy
    • URL / Enthought Canopy
    • about / Enthought Canopy
  • Explorative Data Analysis (EDA)
    • about / Selected graphical examples with pandas

F

  • feature creation
    • about / Feature creation
  • feature selection
    • about / Feature selection
    • univariate selection / Univariate selection
    • recursive elimination / Recursive elimination
    • stability / Stability and L1-based selection
    • L1-based selection / Stability and L1-based selection
  • file
    • arrays, obtaining from / Getting an array directly from a file
  • functions
    • scoring / Scoring functions
    • multilabel classification / Multilabel classification
    • binary classification / Binary classification
    • regression / Regression
  • f_classif object / Univariate selection
  • f_regression object / Univariate selection

G

  • GBT partial dependence plot / GBT partial dependence plot
  • Gensim
    • about / Gensim
    • URL / Gensim
  • Gephi
    • URL / Graph loading, dumping, and sampling
  • graph
    • loading / Graph loading, dumping, and sampling
    • dumping / Graph loading, dumping, and sampling
    • sampling / Graph loading, dumping, and sampling
  • graph algorithms
    • about / Graph algorithms
    • betweenness centrality / Graph algorithms
    • degree centrality / Graph algorithms
    • closeness centrality / Graph algorithms
    • eigenvector centrality / Graph algorithms
  • graphical examples, with pandas
    • about / Selected graphical examples with pandas
    • histograms / Boxplots and histograms
    • boxplots / Boxplots and histograms
    • scatterplots / Scatterplots
    • parallel coordinates / Parallel coordinates
  • Graph Modeling Language (GML)
    • about / Graph loading, dumping, and sampling
  • graph theory
    • about / Introduction to graph theory
  • GTB / Gradient tree boosting (GTB)

H

  • hashing trick
    • about / Dealing with variety
  • heterogeneous lists, NumPy arrays / Heterogeneous lists
  • histograms / Histograms, Boxplots and histograms
  • hyper-parameters
    • max_features / Random Subspaces and Random Patches
    • min_samples_leaf / Random Subspaces and Random Patches
    • bootstrap / Random Subspaces and Random Patches
    • n_estimators / Random Subspaces and Random Patches
  • hyper-parameters optimization
    • about / Hyper-parameters' optimization
    • custom scoring functions, building / Building custom scoring functions
    • grid search runtime, reducing / Reducing the grid search runtime

I

  • ICA / Independent Component Analysis (ICA)
  • image visualization / Image visualization
  • incremental learning
    • about / Scalability with volume
  • index
    • about / Data selection
  • indexing, with NumPy arrays / Slicing and indexing with NumPy arrays
  • IPython
    • URL / IPython
    • about / Introducing IPython
  • IPython Notebook
    • about / The IPython Notebook
  • iterators
    • StratifiedKFold / Using cross-validation iterators
    • LeaveOneOut / Using cross-validation iterators
    • LeavePOut / Using cross-validation iterators
    • LeaveOneLabelOut / Using cross-validation iterators
    • LeavePLabelOut / Using cross-validation iterators

K

  • k-Nearest Neighbors
    • about / The k-Nearest Neighbors
  • kernel PCA / Kernel PCA

L

  • Latent Dirichlet Allocation (LDA)
    • about / Gensim
  • Latent Semantic Analysis (LSA)
    • about / Gensim
  • LDA / Linear Discriminant Analysis (LDA)
  • learning curve / Learning curves
  • LFA / Latent Factor Analysis (LFA)
  • LIBSVM
    • about / The MLdata.org public repository
  • LIBSVM data examples
    • about / LIBSVM data examples
    • URL / LIBSVM data examples
  • linear regression
    • about / Linear and logistic regression
  • lists
    • transforming, to unidimensional arrays / From lists to unidimensional arrays
    • transforming, to multidimensional arrays / From lists to multidimensional arrays
  • logistic regression
    • about / Linear and logistic regression
  • LSA / Latent Semantical Analysis (LSA)

M

  • mask
    • about / Data preprocessing
  • matplotlib
    • about / Matplotlib, Introducing the basics of matplotlib
    • URL / Matplotlib
    • curve plotting / Curve plotting
    • panels, using / Using panels
    • scatterplots / Scatterplots
    • histograms / Histograms
    • bar graphs / Bar graphs
    • image visualization / Image visualization
  • matrix operations, NumPy / Matrix operations
  • Mean Absolute Error (MAE)
    • about / Feature creation
  • Mldata.org public repository / The MLdata.org public repository
  • multidimensional arrays
    • lists, transforming to / From lists to multidimensional arrays
  • multilabel classification / Multilabel classification
    • about / Multilabel classification
    • Confusion matrix / Multilabel classification
    • Accuracy / Multilabel classification
    • Precision / Multilabel classification
    • Recall / Multilabel classification
    • F1 Score / Multilabel classification

N

  • 20newsgroup
    • URL / A special type of data – text
  • n-dimensional array
    • about / NumPy's n-dimensional array
  • Naive Bayes
    • about / Naive Bayes
  • Named Entity Recognition (NER) / Named Entity Recognition (NER)
  • ndarray object class
    • attributes / NumPy's n-dimensional array
  • ndarray objects
    • drawbacks / NumPy's n-dimensional array
    • basics / The basics of NumPy ndarray objects
  • NetworkX
    • about / NetworkX
    • URL / NetworkX
  • NLP
    • about / A peek into Natural Language Processing (NLP)
    • word tokenization / Word tokenization
    • stemming / Stemming
    • Word Tagging / Word Tagging
    • Named Entity Recognition (NER) / Named Entity Recognition (NER)
    • stopwords / Stopwords
    • text classification / A complete data science example – text classification
  • NLTK
    • about / NLTK, A peek into Natural Language Processing (NLP)
    • URL / NLTK
  • NumPy
    • about / NumPy
    • URL / NumPy
    • data, processing with / Data processing with NumPy
    • n-dimensional array / NumPy's n-dimensional array
    • URL, for user guide / Controlling the memory size
    • operations / NumPy fast operation and computations
    • computations / NumPy fast operation and computations
    • matrix operations / Matrix operations
  • NumPy arrays
    • creating / Creating NumPy arrays
    • memory size, controlling / Controlling the memory size
    • heterogeneous lists / Heterogeneous lists
    • slicing with / Slicing and indexing with NumPy arrays
    • indexing with / Slicing and indexing with NumPy arrays
    • stacking / Stacking NumPy arrays
  • NumPy functions
    • arrays, deriving from / Arrays derived from NumPy functions

O

  • OneClassSVM
    • about / OneClassSVM
    • kernel / OneClassSVM
    • degree / OneClassSVM
    • gamma / OneClassSVM
    • nu / OneClassSVM
  • outliers
    • detecting / The detection and treatment of outliers
    • treatment / The detection and treatment of outliers
    • univariate outlier detection / Univariate outlier detection
    • EllipticEnvelope function / EllipticEnvelope
    • OneClassSVM / OneClassSVM

P

  • pandas
    • data, preprocessing with / Data loading and preprocessing with pandas, Data preprocessing
    • data, loading with / Fast and easy data loading
    • data, extracting from / Extracting data from pandas
  • Pandas
    • about / pandas
    • URL / pandas
  • panels
    • using / Using panels
  • parallel coordinates / Parallel coordinates
  • parameters
    • n_iter / A quick overview of Stochastic Gradient Descent (SGD)
    • penalty / A quick overview of Stochastic Gradient Descent (SGD)
    • alpha / A quick overview of Stochastic Gradient Descent (SGD)
    • l1_ratio / A quick overview of Stochastic Gradient Descent (SGD)
    • learning_rate / A quick overview of Stochastic Gradient Descent (SGD)
    • epsilon / A quick overview of Stochastic Gradient Descent (SGD)
    • shuffle / A quick overview of Stochastic Gradient Descent (SGD)
  • PASCAL
    • about / The MLdata.org public repository
  • PCA / Principal Component Analysis (PCA)
  • pip
    • URL / The installation of packages
  • problematic data
    • dealing with / Dealing with problematic data
  • PyPI
    • URL / A glance at the essential Python packages
  • PyPy
    • about / PyPy
  • Python
    • about / Introducing data science and Python
    • characteristics / Introducing data science and Python
    • installing / Installing Python, Step-by-step installation
    • 2 / Python 2 or Python 3?
    • 3 / Python 2 or Python 3?
  • Python 2 / Python 2 or Python 3?
  • Python 3 / Python 2 or Python 3?
  • Python packages
    • about / A glance at the essential Python packages
    • NumPy / NumPy
    • SciPy / SciPy
    • Pandas / pandas
    • Scikit-learn / Scikit-learn
    • IPython / IPython
    • matplotlib / Matplotlib
    • statsmodels / Statsmodels
    • Beautiful Soup / Beautiful Soup
    • NetworkX / NetworkX
    • NLTK / NLTK
    • Gensim / Gensim
    • PyPy / PyPy
    • installing / The installation of packages
    • URL / The installation of packages
    • upgrades / Package upgrades
  • PythonXY
    • about / PythonXY
    • URL / PythonXY

R

  • Radial Basis Function
    • about / SVM for classification
  • random forests
    • about / The IPython Notebook
  • Random Patches / Random Subspaces and Random Patches
  • Random Subspaces / Random Subspaces and Random Patches
  • RBF kernel
    • about / SVM for classification
  • RBM / Restricted Boltzmann Machine (RBM)
  • Receiver Operating Characteristics curve (ROC) / Binary classification
  • recursive elimination / Recursive elimination
  • regression
    • about / Regression
    • MAE / Regression
    • MSE / Regression
    • R2 score / Regression

S

  • scatterplots / Scatterplots, Scatterplots
  • scientific distributions
    • about / Scientific distributions
    • Anaconda / Anaconda
    • Enthought Canopy / Enthought Canopy
    • PythonXY / PythonXY
    • WinPython / WinPython
  • Scikit-learn
    • about / Scikit-learn
    • URL / Scikit-learn
  • Scikit-learn sample generators / Scikit-learn sample generators
  • Scikit-learn toy datasets
    • about / Scikit-learn toy datasets
    • methods / Scikit-learn toy datasets
  • Scikit-learn website
    • URL / Using cross-validation iterators
  • SciPy
    • about / SciPy
    • URL / SciPy
  • SGDClassifier
    • about / A quick overview of Stochastic Gradient Descent (SGD)
  • SGDRegressor
    • about / A quick overview of Stochastic Gradient Descent (SGD)
  • Silhouette Coefficient
    • URL / An overview of unsupervised learning
  • Singular Value Decomposition (SVD)
    • about / A variation of PCA for big data – RandomizedPCA
  • slicing, with NumPy arrays / Slicing and indexing with NumPy arrays
  • statsmodels
    • about / Statsmodels
    • URL / Statsmodels
  • stemming / Stemming
  • Stochastic Gradient Descent (SGD) / A quick overview of Stochastic Gradient Descent (SGD)
  • stopwords / Stopwords
  • Support Vector Machine (SVM)
    • about / The IPython Notebook
  • SVM
    • about / Advanced nonlinear algorithms
    • tuning / Tuning SVM
    • parameters / Tuning SVM
  • svm.OneClassSVM class
    • about / Univariate outlier detection

T

  • testing
    • about / Testing and validating
  • text
    • about / A special type of data – text
  • text classification / A complete data science example – text classification
  • textual data
    • working with / Working with categorical and textual data

U

  • UCI repository
    • URL / SVM for classification
  • unidimensional arrays
    • lists, transforming to / From lists to unidimensional arrays
  • univariate selection / Univariate selection
  • unsupervised learning
    • about / An overview of unsupervised learning

V

  • validating
    • about / Testing and validating
  • validation curves / Validation curves
  • variables
    • selecting / Feature importance
  • variety
    • about / Dealing with big data
  • velocity
    • about / Dealing with big data
  • veracity
    • about / Dealing with big data
  • visualization rules
    • URL / Introducing the basics of matplotlib
  • volume
    • about / Dealing with big data

W

  • WinPython
    • URL / WinPython
    • about / WinPython
  • WinPython Package Manager (WPPM)
    • about / WinPython
  • Word Tagging / Word Tagging
  • word tokenization / Word tokenization
lock icon The rest of the chapter is locked
arrow left Previous Section
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}