Chapter 10: Regression Analysis with NumPy and Scikit-Learn
The objective of this chapter is to predict an unknown variable based on samples of one or more other variables. In the simplest case, we have a sample of paired data (x1, y1), …, (xn, yn) and need to find a line that best fits the data (that is, a line that passes through or is close to most of the data points) with SciPy implementations of the least-squares regression model. We will then extend the method to fit nonlinear curves and to take whole databases (x11, x12, …, x1k, y1), …,(xn1, xn2, …, xnk, yn) and try to predict y based on k input variables.
We will also be using some Python libraries, such as SciPy, NumPy, and scikit-learn. SciPy is an open source Python library for scientific computing, and NumPy will help us to work with multidimensional arrays and matrices and apply high-level mathematical functions to these arrays. Scikit-learn is a machine learning library, and we will be...