One of the challenges of IoT is determining where to place the sensors and how many sensors are needed. Take pumps, for example: one way of determining whether a pump's bearings are going out is to use a microphone to listen for a high-pitched squeal. Another way is to use a parameter to determine whether it is vibrating more. Yet another way is to measure the current and see whether it is fluctuating. There is no one right way to determine whether a pump's ball bearings are going out; however, implementing all three techniques may be cost-prohibitive and redundant. A common way of looking at the correlation between different sensors is using a heat map. In the following code, we use a heat map to find the correlation between sensors. In other words, we are looking for sensors that are transmitting redundant information:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# load the sample training data
train = pd.read_csv('/dbfs/FileStore/tables/Bike_train.csv')
for i in range(50):
a = np.random.normal(5,i+1,10)
b.append(a)
c = np.array(b)
cm =np.corrcoef(c)
plt.imshow(cm,interpolation='nearest')
plt.colorbar()
#heat map
plt.figure(figsize=(17,11))
sns.heatmap(train.iloc[:,1:30].corr(), cmap= 'viridis', annot=True)
display(plt.show())
The following screenshot shows the heat map:
In the preceding example, we can see that count and registered have a very high correlation because both numbers are close to 1. Similarly, we can see that temp and atemp have a high degree of correlation. Using this data without pruning out the corollary data can give a weighted effect to machine learning models training on the dataset.
Cross-tabulation provides a table of the frequency distributions. This can be used to determine whether two different sensors are counting the same. The following is the code to display the cross-tabulation table:
display(DF.stat.crosstab("titleType", "genres"))