Before we dive into data munging, let's take a moment to explain the difference between an algorithm and a model, two terms we've been using up until now without a formal definition.
Consider the simple linear regression example we saw in Chapter 1, Introduction to Machine Learning and Predictive Analytics — the linear regression equation with one predictor:
Here, x is the variable, ŷ the prediction, not the real value, and (a,b) the parameters of the linear regression model:
- The conceptual or theoretical model is the representation of the data that is the most adapted to the actual dataset. It is chosen at the beginning by the data scientist. In this case, the conceptual model is the linear regression model, where the prediction is a linear combination of a variable. Other conceptual models include decision trees, naive bayes, neural networks...