Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon

Practical AI in Excel: Create a Linear Regression Model

Save for later
  • 12 min read
  • 28 Jun 2023

article-image

AI is often associated with complex algorithms and advanced programming, but for basic linear regression models, Excel is a suitable tool. While Excel may not be commonly linked with AI, it can be an excellent option for building statistical machine-learning models. Excel offers similar modeling capabilities as other libraries, without requiring extensive setup or coding skills. It enables leveraging machine learning for predictive analytics without writing code. This article focuses on using Excel to build a linear regression model for predicting story points completed by a software development team based on hours worked.

What is Linear Regression?

Before a linear regression model can be built it is important to understand what linear regression is and what it's used for.  For many, their first true shake with linear regression will come in the form of a machine learning library or machine learning cloud service. In terms of modern machine learning, linear regression is a supervised machine learning algorithm that is used for predictive analytics.  In short, linear regression is a very common and easy-to-use machine learning model that is borrowed from the field of statistics.  This means, at its core, linear regression is a statistical analysis technique that models a relationship between two or more variables.  In the most rudimentary sense, linear regression boils down to the following equation,

y = mx + b

As can be seen, the equation (that is the linear regression model) is little more than the equation for a line.  No matter the library or machine learning service that is used, in its purest form linear regression will boil down to the above equation.  In short, linear regression is used for predictive, numerical models.  In other words, linear regression produces models that attempt to predict a numerical value.  This could be the weight of a person in relation to their height, the value of a stock in relation to the Dow, or anything similar to those two applications.  As stated before, the model that will be produced for this article will be used to predict the number of story points for a given number of hours worked.

Why should Excel be used?

Due to the statistical nature of linear regression, Excel is a prime choice for creating linear regression models.  This is especially true if (among other things) one or more of the following conditions are met,

  • The person creating the model does not have a strong computer science or machine learning background. 
  • The person needs to quickly produce a model.
  • The data set is very small.

If a person simply needs to create a forecasting model for their team, forecast stocks, customer traffic, or whatever it may be, Excel will oftentimes be a better choice than creating a traditional program or using complex machine learning software. With that being established, how would one go about creating a linear regression model?

Installing the Necessary Add-ins

To build a linear regression model the following will be needed,

  • A working copy of Excel.
  • Analysis ToolPak add-in for Excel.

The Analysis ToolPak is the workhorse for this tutorial.  As such, if it is not installed follow the steps in the next section; however, if the add-in is already installed the following section can be skipped.

Installing Data Analysis ToolPak

1. Click,  File -> Option -> Add-ins

Once done the following wizard should appear:

practical-ai-in-excel-create-a-linear-regression-model-img-0

Figure 1 – Options Wizard

2. Locate Analysis ToolPak and select it.  Once that is done the following popup will appear.

practical-ai-in-excel-create-a-linear-regression-model-img-1

Figure 2 – Add-ins Wizard

For this tutorial, all that is technically needed is the Analysis ToolPak but it is a good idea to install the VBA add-in as well. 

3. Verify the installation by navigating to the Data tab and verifying that the Data Analysis tools are installed.  If everything is installed properly, the following should be visible.  

practical-ai-in-excel-create-a-linear-regression-model-img-2

Figure 3 – Data Analysis Tool

Once the Analysis ToolPak is installed a linear regression model can be generated with a few clicks of the mouse. 

Building a Linear Regression Model to Predict Story Points. 

Once all the add-ins are installed, create a workbook and copy in the following data:

Hours

Story Points

16

13

15

12

15

11

13

4

22

8

28

18

30

19

10

3

21

14

11

7

12

9

25

19

24

17

23

15

 Before the model can be built the independent and dependent variables must be chosen.  This is a fancy way of determining which column is going to be the input and which is going to be the output for the model.  In this case, the goal is to predict the number of story points for a given number of hours worked. As such, when the model is created the number of hours will be inputted to return the number of predicted story points. This means that the number of hours worked will be the independent variable which will be on the X-Axis of the graph and the number of story points will be the dependent variable which will be on the Y-Axis. 

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime

As such, to generate the model perform the following steps,

1. Navigate to the Data tab and click Data Analysis.  When complete the following popup should appear.

practical-ai-in-excel-create-a-linear-regression-model-img-3

Figure 4 – Regression Analysis

  Scroll down and select Regression then press the OK button.

2. Once step 1 is completed the following wizard should appear.

practical-ai-in-excel-create-a-linear-regression-model-img-4

Figure 5 – Regression Setup 

Input the data the same way it is presented in Figure 5.  Once done The data should be rendered as in Figure 6.

practical-ai-in-excel-create-a-linear-regression-model-img-5

Figure 6 – Linear Regression Output.

At this point, the linear regression model has been produced.  To make a prediction all one has to do is multiply the number of hours worked by the Hours value in the Coefficient column and add the Intercept value in the Coefficient column to that product. However, it is advisable to generate a trendline and add the line’s equation and the R-Squared value to the chart to make things easier to see.  This can be remedied by simply deleting the predicted dots and adding a trendline like in Figure 7.

practical-ai-in-excel-create-a-linear-regression-model-img-6

Figure 7 – Trendline

The trendline will show the best fit for the model.  In other words, the model will use the equation that governs the trendline to predict a value.  To generate the line’s equation click the arrow button by Trendline and click More Options.  When this is done a sidebar should appear similar to the one in Figure 8.

practical-ai-in-excel-create-a-linear-regression-model-img-7

Figure 8 – Format Trendline Menu

From here select the R-square value checkbox and the Display Equation on chart checkbox. When this is done those values should be displayed on the graph like in Figure 9. 

practical-ai-in-excel-create-a-linear-regression-model-img-8

Figure 9 – Regression Model with line equation and R-squared value

To create a prediction, all one has to do is plug in the number of hours for x in the equation and the computed value will be an approximation for the number of story points for the hours worked. 

Interperting the Model

Regression Statistics

Multiple R

0.862529

R Square

0.743956

Adjusted R Square

0.722619

Standard Error

2.805677

Observations

14

Now that the model is generated, how good is it?  This question can be answered with the data that was produced in Figure 6.  However, a whole book could be dedicated to interpreting those outputs, so for this article, the data in the observation group which can be thought of as the high-level summary of the model will be explored.   Consider, the following data:

Regression Statistics

Multiple R

0.862529

R Square

0.743956

Adjusted R Square

0.722619

Standard Error

2.805677

Observations

14

 The first value is Multiple R or as it is sometimes called the Correlation Coefficient.  This value can range from -1 to 0 or 0 to 1 depending on whether the correlation is negative or positive respectively.  The closer the coefficient is to either -1 or 1 the better. 

With that, what is the difference between a negative and positive correlation?  Whether a correlation is negative or positive depends on the graph’s orientation which in turn means whether the correlation coefficient is positive or negative.  If the graph is downward oriented the correlation is negative. For these models, the correlation coefficient will be less than 0.  On the other hand, if the graph is upward oriented like the graph produced by the model it is said to have a positive correlation which in turn means the coefficient will be greater than 0.  Consider Figure 10,

practical-ai-in-excel-create-a-linear-regression-model-img-9

Figure 10 – Negative and Positive Correlation

 Ultimately it doesn’t matter if the model has a positive or negative correlation.  All the correlation means is that as one value rises the other will either rise with it or fall.  In terms of the model produced, the Multiple R-value is .86.  All things considered that is a really good correlation coefficient. 

The next important value to look at is the R-Squared value or the Coefficient of Determination.  This value describes how well the model fits the data.  In other words, it determines how many data points fall on the line.  The R-Squred value will range from 0 to 1.  As such, the closer the value is to 1 the better the model will be.  Though a value as close to 1 is desirable it is naïve to assume that an R-Squared of 1 will ever be achievable.  However, a lower R-Squared value is not necessarily a bad thing.  Depending on what is being measured, what constitutes a “good” R-Squared value will vary.  In the case of this model, the R-Squared is about .74 which means about 74% of the data can be explained by the model.  Depending on the context of the application that can be considered good, but it should be remembered that at most the model is only predicting 74% of what makes up the number of completed story points. 

Adjusted R-Squred is simply a more precise view of the R-Squared value. In simple terms, the adjusted R-Squared value determines how much of a variation in the dependent variables can be explained by the independent variables. The Adjusted R for this model is .72 which is in line with the R-Squard value.

Finally, the Standard Error is the last fitting metric.  In a very simplistic sense, this metric is a measure of precision for the model.  As such, the standard error for this model is about 2.8.  Much like other metrics what constitutes good is subjective.  However, the closer the value is to 0 the more concise the model is. 

Using the model

Now that the model has been created, what would someone do with it, that is how would they use it?  The answer is surprisingly simple.  The whole model is a line equation.  That line will give an approximation of a value based on the given input.  In the case of this model, a person would input the number of hours worked to try to predict the number of story points. As such, someone could simply input the number of hours in a calculator, add the equation to a spreadsheet, or do anything they want with it.  Put simply, this or any other linear regression model is used by inputting a value or values and crunching the numbers.  For example, the equation rendered was as follows:

y = 0.6983x - 1.1457

The spreadsheet could be modified to include the following

practical-ai-in-excel-create-a-linear-regression-model-img-10

In this case, the user would simply have to input the number of hours worked to get a predicted number of story points. 

The important thing to remember is that this model along with any other regression model is not gospel.  Much like in any other machine learning system, these values are simply estimates based on the data that was fed into it.  This means if a different data set or subset is used, the model can and probably will be different. 

Conclusion

In summary, a simple Excel spreadsheet was used to create a linear regression model.  The linear regression model that was utilized will probably be very similar to a model generated with dedicated machine learning software.  Does this mean that everyone should abandon their machine-learning software packages and libraries and solely use Excel?  The long and the short of it is no! Excel, much like a library like Scikit-learn or any other, is a tool.  However, for laypersons that don’t have a strong computer science background and need to produce a quick regression model, Excel is an excellent tool to do so. 

Author Bio

M.T. White has been programming since the age of 12. His fascination with robotics flourished when he was a child programming microcontrollers such as Arduino. M.T. currently holds an undergraduate degree in mathematics, and a master's degree in software engineering, and is currently working on an MBA in IT project management. M.T. is currently working as a software developer for a major US defense contractor and is an adjunct CIS instructor at ECPI University. His background mostly stems from the automation industry where he programmed PLCs and HMIs for many different types of applications. M.T. has programmed many different brands of PLCs over the years and has developed HMIs using many different tools.

Author of the book: Mastering PLC Programming