You're reading from Data Forecasting and Segmentation Using Microsoft Excel Perform data grouping, linear predictions, and time series machine learning statistics without using code

Product type Paperback

Published in May 2022

Publisher Packt

ISBN-13 9781803247731

Length 324 pages

Edition 1st Edition

Languages

C++

Tools

Excel

Concepts

Data Analysis

Author (1):

Roque

Preface

1. Part 1 – An Introduction to Machine Learning Functions

2. Chapter 1: Understanding Data Segmentation FREE CHAPTER

3. Chapter 2: Applying Linear Regression

4. Chapter 3: What is Time Series?

5. Part 2 – Grouping Data to Find Segments and Outliers

6. Chapter 4: Introduction to Data Grouping

7. Chapter 5: Finding the Optimal Number of Single Variable Groups

8. Chapter 6: Finding the Optimal Number of Multi-Variable Groups

9. Chapter 7: Analyzing Outliers for Data Anomalies

10. Part 3 – Simple and Multiple Linear Regression Analysis

11. Chapter 8: Finding the Relationship between Variables

12. Chapter 9: Building, Training, and Validating a Linear Model

13. Chapter 10: Building, Training, and Validating a Multiple Regression Model

14. Part 4 – Predicting Values with Time Series

15. Chapter 11: Testing Data for Time Series Compliance

16. Chapter 12: Working with Time Series Using the Centered Moving Average and a Trending Component

17. Chapter 13: Training, Validating, and Running the Model

18. Other Books You May Enjoy

Answers

Here are the answers to the preceding questions:

No. We need to do a visual inspection of the data to guess the possible number of groups for the data. Then, we run the K-means elbow function to get the optimal number of groups of the data.
We choose the number where the curve starts to flatten.
The following are the parameters to execute the K-means function:
- Number of groups to process
- The range of the input data
- The range to put the results returned by the K-means function
Use a pivot table and chart analysis with the minimum, maximum, and centroid values for each group. If the maximum and minimum values have a large distance from the average (centroid), that means that the group has scattered values.
Scattered values with a large distance from the centroid are possible outliers that need further research. One approach could be to do another K-means process for this group to create subgroups to improve the data classification.