Preface
Data mining, or parsing the data to extract useful insights, is a niche skill that can transform your career as a data scientist Python is a flexible programming language that is equipped with a strong suite of libraries and toolkits, and gives you the perfect platform to sift through your data and mine the insights you seek. This Learning Path is designed to familiarize you with the Python libraries and the underlying statistics that you need to get comfortable with data mining. You will learn how to use Pandas, Python's popular library to analyze different kinds of data, and leverage the power of Matplotlib to generate appealing and impressive visualizations for the insights you have derived. You will also explore different machine learning techniques and statistics that enable you to build powerful predictive models. By the end of this Learning Path, you will have the perfect foundation to take your data mining skills to the next level and set yourself on the path to become a sought-after data science professional. This Learning Path includes content from the following Packt products:
- Statistics for Machine Learning by Pratap Dangeti
- Matplotlib 2.x By Example by Allen Yu, Claire Chung, Aldrin Yim
- Pandas Cookbook by Theodore Petrou
Who this book is for
If you want to learn how to use the many libraries of Python to extract impactful information from your data and present it as engaging visuals, then this is the ideal Learning Path for you. Some basic knowledge of Python is enough to get started with this Learning Path.
What this book covers
Chapter 1, Journey from Statistics to Machine Learning, introduces you to all the necessary fundamentals and basic building blocks of both statistics and machine learning. All fundamentals are explained with the support of both Python and R code examples across the chapter.
Chapter 2, Tree-Based Machine Learning Models, focuses on the various tree-based machine learning models used by industry practitioners, including decision trees, bagging, random forest, AdaBoost, gradient boosting, and XGBoost with the HR attrition example in both languages.
Chapter 3, K-Nearest Neighbors and Naive Bayes, illustrates simple methods of machine learning. K-nearest neighbors is explained using breast cancer data. The Naive Bayes model is explained with a message classification example using various NLP preprocessing techniques.
Chapter 4, Unsupervised Learning, presents various techniques such as k-means clustering, principal component analysis, singular value decomposition, and deep learning based deep auto encoders. At the end is an explanation of why deep auto encoders are much more powerful than the conventional PCA techniques.
Chapter 5, Reinforcement Learning, provides exhaustive techniques that learn the optimal path to reach a goal over the episodic states, such as the Markov decision process, dynamic programming, Monte Carlo methods, and temporal difference learning. Finally, some use cases are provided for superb applications using machine learning and reinforcement learning.
Chapter 6, Hello Plotting World!, covers the basic constituents of a Matplotlib figure, as well as the latest features of Matplotlib version 2.
Chapter 7, Visualizing Online Data, teaches you how to design intuitive infographics for effective storytelling through the use of real-world datasets.
Chapter 8, Visualizing Multivariate Data, gives you an overview of the plot types that are suitable for visualizing datasets with multiple features or dimensions.
Chapter 9, Adding Interactivity and Animating Plots, shows you that Matplotlib is not limited to creating static plots. You will learn how to create interactive charts and animations.
Chapter 10, Selecting Subsets of Data, covers the many varied and potentially confusing ways of selecting different subsets of data.
Chapter 11, Boolean Indexing, covers the process of querying your data to select subsets of it based on Boolean conditions.
Chapter 12, Index Alignment, targets the very important and often misunderstood index
object. Misuse of the Index is responsible for lots of erroneous results, and these recipes show you how to use it correctly to deliver powerful results.
Chapter 13, Grouping for Aggregation, Filtration, and Transformation, covers the powerful grouping capabilities that are almost always necessary during a data analysis. You will build customized functions to apply to your groups.
Chapter 14, Restructuring Data into a Tidy Form, explains what tidy data is and why it’s so important, and then it shows you how to transform many different forms of messy datasets into tidy ones.
Chapter 15, Combining Pandas Objects, covers the many available methods to combine DataFrames and Series vertically or horizontally. We will also do some web-scraping to compare President Trump's and Obama's approval rating and connect to an SQL relational database.
To get the most out of this book
This book assumes that you know the basics of Python and R and how to install the libraries. It does not assume that you are already equipped with the knowledge of advanced statistics and mathematics, like linear algebra and so on.
The following versions of software are used throughout this book, but it should run fine with any more recent ones as well:
- Anaconda 3–4.3.1 (all Python and its relevant packages are included in Anaconda, Python 3.6.1, NumPy 1.12.1, Pandas 0.19.2, and scikit-learn 0.18.1)
- R 3.4.0 and RStudio 1.0.143
- Theano 0.9.0
- Keras 2.0.2
- A Windows 7+, macOS 10.10+, or Linux-based computer with 4 GB RAM or above is recommended.
Download the example code files
You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
- Log in or register at www.packt.com.
- Select the
SUPPORT
tab. - Click on
Code Downloads & Errata
. - Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
- WinRAR/7-Zip for Windows
- Zipeg/iZip/UnRarX for Mac
- 7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Numerical-Computing-with-Python. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Conventions used
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "The mode
function was not implemented in the numpy
package.". Any command-line input or output is written as follows:
>>> import numpy as np
>>> from scipy import stats
>>> data = np.array([4,5,1,2,7,2,6,9,3])
# Calculate Mean
>>> dt_mean = np.mean(data) ;
print ("Mean :",round(dt_mean,2))
New terms and important words are shown in bold.
Note
Warnings or important notes appear like this.
Note
Tips and tricks appear like this.
Get in touch
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at customercare@packtpub.com
.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com
with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Reviews
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.