Introducing basic statistics and data science
Let's say you want to know the temperature of your room, so you measure it every hour during the day using a particular tool. This data is necessary because you want to decide whether to buy an AC (Air Conditioning) machine or not. After measurement is done, you obtain a list of temperature data. The results of your measurements can be seen in the following table:
Time |
Temperature (Celsius) |
Time |
Temperature (Celsius) |
---|---|---|---|
01:00 |
18 |
13:00 |
28 |
02:00 |
17 |
14:00 |
29 |
03:00 |
18 |
15:00 |
28 |
04:00 |
19 |
16:00 |
27 |
05:00 |
20 |
17:00 |
25 |
06:00 |
20 |
18:00 |
24 |
07:00 |
21 |
19:00 |
24 |
08:00 |
22 |
20:00 |
23 |
09:00 |
22 |
21:00 |
22 |
10:00 |
24 |
22:00 |
20 |
11:00 |
25 |
23:00 |
19 |
12:00 |
26 |
24:00 |
19 |
The preceding table shows of the temperature data in tabular form. You try to understand the meaning of the data. For this situation, you need some knowledge of statistics, along with some statistics terms such as mean, median, variance, and standard deviation.
Suppose we have a sample of n data, which is designated by x1, x2, x3, ..., xn. We can calculate mean, median, variance, and standard deviation using the following formulas:
Tip
To compute median value, you should arrange the data in ascending order.
From the preceding table, you can calculate the mean, median, variance and standard deviation using the preceding formulas. You should obtain values of 22.5, 22, 12.348, and 3.514 respectively.
To understand the pattern of the data, you try to visualize it in graphics form, for instance, using Microsoft Excel. The result can be seen in the following figure:
You can see that the average temperature of your room is 22.5 Celsius. The temperature maximum and minimum values are 19 and 17, respectively. With this information, you can think about what type of AC machine you want to buy.
Furthermore, you can extend your investigation by measuring your room's temperature for a week. After you have measured, you can plot the measurements in graphics form, for instance, using Microsoft Excel. A sample of temperature measurements is shown in the following figure:
The graph shows room temperature changes every day. If you measure it every day for a year, you should see temperature trends in your room. Knowledge of data science can improve your ability to learn from data. Of course, some statistics and machine learning computing are involved to get insight how data behaviors are.
This book will help you to get started with how to apply data science and machine learning in real cases, with a focus on IoT fields.