Exploring the data science process
Performing data science work is often an iterative process, where the data scientist needs to return to earlier steps if they run into challenges. There are many ways to categorize the data science process, but it often includes:
- Data collection
- Data exploration
- Data modeling
- Model evaluation
- Model deployment and monitoring
Let’s briefly touch on each step and discuss what’s expected of the data scientist during them.
Data collection
Data collection and preprocessing involves gathering data from various sources (such as databases, APIs, and web scraping), then cleaning and transforming the data to prepare it for analysis. This step involves dealing with missing, inconsistent, or noisy data and converting it into a structured format. Depending on the organization, a team of data engineers support this step of the data science process; however, it is common for the data scientist to manage this process...