Other Books You May Enjoy
If you enjoyed this book, you may be interested in these other books by Packt:
Cleaning Data for Effective Data Science
David Mertz
ISBN: 9781801071291
- Ingest and work with common data formats like JSON, CSV, SQL and NoSQL databases, PDF, and binary serialized data structures
- Understand how and why we use tools such as pandas, SciPy, scikit-learn, Tidyverse, and Bash
- Apply useful rules and heuristics for assessing data quality and detecting bias, like Benford's law and the 68-95-99.7 rule
- Identify and handle unreliable data and outliers, examining z-score and other statistical properties
- Impute sensible values into missing data and use sampling to fix imbalances
- Use dimensionality reduction, quantization, one-hot encoding, and other feature engineering techniques to draw out patterns in your data
- Work carefully with time series data, performing de-trending and interpolation