Introducing logistic regression
Linear regression is well suited when predicting the value of a continuous numerical variable. Based on the assumption that there is a linear relationship between the dependent and the independent variable, the method aims to find the line of best fit and use it for prediction. In this chapter, however, we are dealing with a classification problem, as we need to assign a sentiment label (positive or negative) to a piece of text. Consequently, this is a different problem because the dependent variable is categorical and not numerical.
This section applies a supervised learning algorithm called logistic regression, which is suitable for binary classification problems. Notice that there is also the multinomial logistic regression algorithm option for multiclass problems. Logistic regression is a parametric learning algorithm that outputs a probability that an input belongs to a particular class. Instead of fitting a straight line to the data, the effort...