Chapter 1. Laying the Foundation for Reproducible Data Analysis
In this chapter, we will cover the following recipes:
- Setting up Anaconda
- Installing the Data Science Toolbox
- Creating a virtual environment with virtualenv and virtualenvwrapper
- Sandboxing Python applications with Docker images
- Keeping track of package versions and history in IPython Notebooks
- Configuring IPython
- Learning to log for robust error checking
- Unit testing your code
- Configuring pandas
- Configuring matplotlib
- Seeding random number generators and NumPy print options
- Standardizing reports, code style, and data access
Introduction
Reproducible data analysis is a cornerstone of good science. In today's rapidly evolving world of science and technology, reproducibility is a hot topic. Reproducibility is about lowering barriers for other people. It may seem strange or unnecessary, but reproducible analysis is essential to get your work acknowledged by others. If a lot of people confirm your results, it will have a positive...