You're reading from Hands-On Data Analysis with Pandas A Python data science handbook for data collection, wrangling, analysis, and visualization

Product type Paperback

Published in Apr 2021

Publisher Packt

ISBN-13 9781800563452

Length 788 pages

Edition 2nd Edition

Languages

Python

Tools

Pandas

Concepts

Data Analysis

Author (1):

Stefanie Molin

Preface

1. Section 1: Getting Started with Pandas

2. Chapter 1: Introduction to Data Analysis FREE CHAPTER

3. Chapter 2: Working with Pandas DataFrames

4. Section 2: Using Pandas for Data Analysis

5. Chapter 3: Data Wrangling with Pandas

6. Chapter 4: Aggregating Pandas DataFrames

7. Chapter 5: Visualizing Data with Pandas and Matplotlib

8. Chapter 6: Plotting with Seaborn and Customization Techniques

9. Section 3: Applications – Real-World Analyses Using Pandas

10. Chapter 7: Financial Analysis – Bitcoin and the Stock Market

11. Chapter 8: Rule-Based Anomaly Detection

12. Section 4: Introduction to Machine Learning with Scikit-Learn

13. Chapter 9: Getting Started with Machine Learning in Python

14. Chapter 10: Making Better Predictions – Optimizing Models

15. Chapter 11: Machine Learning Anomaly Detection

16. Section 5: Additional Resources

17. Chapter 12: The Road Ahead

18. Solutions

19. Other Books You May Enjoy

Appendix

Exercises

Practice building and evaluating machine learning models in scikit-learn with the following exercises:

Build a clustering model to distinguish between red and white wine by their chemical properties:
a) Combine the red and white wine datasets (data/winequality-red.csv and data/winequality-white.csv, respectively) and add a column for the kind of wine (red or white).
b) Perform some initial EDA.
c) Build and fit a pipeline that scales the data and then uses k-means clustering to make two clusters. Be sure not to use the quality column.
d) Use the Fowlkes-Mallows Index (the fowlkes_mallows_score() function is in sklearn.metrics) to evaluate how well k-means is able to make the distinction between red and white wine.
e) Find the center of each cluster.
Predict star temperature:
a) Using the data/stars.csv file, perform some initial EDA and then build a linear regression model of all the numeric columns to predict the temperature of the star.
b) Train the model on 75% of...