Overview of the five steps
The five essential steps to perform data science are as follows:
- Asking an interesting question
- Obtaining the data
- Exploring the data
- Modeling the data
- Communicating and visualizing the results
First, let's look at the five steps with reference to the big picture.
Ask an interesting question
This is probably my favorite step. As an entrepreneur, I ask myself (and others) interesting questions every day. I would treat this step as you would treat a brainstorming session. Start writing down questions regardless of whether or not you think the data to answer these questions even exists. The reason for this is twofold. First off, you don't want to start biasing yourself even before searching for data. Secondly, obtaining data might involve searching in both public and private locations and, therefore, might not be very straightforward. You might ask a question and immediately tell yourself "Oh, but I bet there's no data out there that can help me,"...