Linear regression with Apache Spark
Linear regression is the most commonly used method for describing the relationship between predictors (or covariates) and outcomes.
Linear regression can solve problems that require a continuous target variable to predict real, valued outcomes. There can be multiple variables or features in the problem domain.
How does linear regression work?
Linear regression focuses on finding optimal values for parameters (coefficients) to fit the best possible hyper plane to the training data.
Linear regression is approached as a minimization problem. The mean squared error, which is the difference between the expected outcome and the actual outcome for the training set, is minimized.
The squared error function is also known as the cost function. So, the goal is to minimize the cost function.
Each value of the parameter given in the cost function refers to one hypothesis function. The idea is to find the optimal hypothesis that has values for parameters which best minimize...