You're reading from TinyML Cookbook Combine machine learning with microcontrollers to solve real-world problems

Product type Paperback

Published in Nov 2023

Publisher Packt

ISBN-13 9781837637362

Length 664 pages

Edition 2nd Edition

Languages

Python

Tools

Combine

Concepts

Machine Learning

Author (1):

Gian Marco Iodice

View More author details

Table of Contents (16) Chapters

Preface

1. Getting Ready to Unlock ML on Microcontrollers FREE CHAPTER

2. Unleashing Your Creativity with Microcontrollers

3. Building a Weather Station with TensorFlow Lite for Microcontrollers

4. Using Edge Impulse and the Arduino Nano to Control LEDs with Voice Commands

5. Recognizing Music Genres with TensorFlow and the Raspberry Pi Pico – Part 1

6. Recognizing Music Genres with TensorFlow and the Raspberry Pi Pico – Part 2

7. Detecting Objects with Edge Impulse Using FOMO on the Raspberry Pi Pico

8. Classifying Desk Objects with TensorFlow and the Arduino Nano

9. Building a Gesture-Based Interface for YouTube Playback with Edge Impulse and the Raspberry Pi Pico

10. Deploying a CIFAR-10 Model for Memory-Constrained Devices with the Zephyr OS on QEMU

11. Running ML Models on Arduino and the Arm Ethos-U55 microNPU Using Apache TVM

12. Enabling Compelling tinyML Solutions with On-Device Learning and scikit-learn on the Arduino Nano and Raspberry Pi Pico

13. Conclusion

14. Other Books You May Enjoy

15. Index

Evaluating the model’s effectiveness

Accuracy and loss are not enough to judge the model’s effectiveness. In general, accuracy is a good performance indicator if the dataset is balanced, but it does not tell us the strengths and weaknesses of our model. For instance, what classes do we recognize with high confidence? What frequent mistakes does the model make?

This recipe will judge the model’s effectiveness by visualizing the confusion matrix and evaluating the recall, precision, and F1-score performance metrics.

Getting ready

To complete this recipe, we must familiarize ourselves with the confusion matrix and the alternative performance metrics crucial for evaluating the model’s effectiveness. Let’s start by learning the confusion matrix in the following subsection.

Evaluating the performance with the confusion matrix

A confusion matrix is an NxN matrix reporting the number of correct and incorrect predictions on the test dataset, where N is the number of output classes.

For our binary classification model, where there are two output categories, we have a 2x2 matrix like the one in Figure 3.8:

A diagram of negative negatives

Description automatically generated

Figure 3.8: A confusion matrix

The four values reported in the previous confusion matrix are as follows:

True positive (TP): The number of predicted positive results that are actually positive
True negative (TN): The number of predicted negative results that are actually negative
False positive (FP): The number of predicted positive results that are actually negative
False negative (FN): The number of predicted negative results that are actually positive

Ideally, we would like to have 100% accuracy, defined as the ratio of correctly predicted instances (both positive and negative) to the total number of instances in the dataset:

The preceding formula implies that the confusion matrix’s gray cells (FN and FP) should be 0 to obtain 100% accuracy.

However, although accuracy is a valuable metric, it does not provide a complete picture of model performance. Therefore, the following subsections will present alternative performance metrics for assessing the model’s effectiveness.

Evaluating recall, precision, and F-score

The first performance metric we want to present is recall, which quantifies how many of all positive (“Yes”) samples we predicted correctly:

As a result, recall should be as high as possible.

However, this metric does not consider the misclassification of negative samples. Hence, the model could be excellent at classifying positive samples but incapable of classifying negative ones.

For this reason, there is an alternative performance indicator that considers FPs. It is precision, which quantifies how many predicted positive classes (“Yes”) were actually positive:

Therefore, as with recall, precision should be as high as possible.

If we are interested in evaluating both recall and precision simultaneously, the F-score metric is what we need. In fact, this metric combines recall and precision with a single formula as follows:

The higher the F-score, the better the model’s effectiveness.

How to do it…

Continue working in Colab, and follow the steps to visualize the confusion matrix and calculate the recall, precision, and F-score metrics:

Step 1:

Use the trained model to predict the output classes of the test dataset:

y_test_pred = model.predict(x_test) 
y_test_pred = (y_test_pred > 0.5).astype("int32")

The line y_test_pred = (y_test_pred > 0.5).astype("int32") binarizes the predicted values using a threshold of 0.5. If a predicted value is greater than 0.5, it is converted to 1.

Otherwise, it is converted to 0.

Step 2:

Compute the confusion matrix with scikit-learn:

import sklearn
cm = sklearn.metrics.confusion_matrix(y_test, 
                                      y_test_pred)

The confusion matrix is obtained with the confusion_matrix() function from the scikit-learn library, which takes two arguments: the true labels of the test dataset (y_test) and the predicted labels (y_test_pred). The cm variable stores the confusion matrix.

Step 3:

Display the confusion matrix in a heatmap:

index_names  = ["Actual No Snow", "Actual Snow"]
column_names = ["Predicted No Snow", "Predicted Snow"]
df_cm = pd.DataFrame(cm, index = index_names,
                     columns = column_names)
plt.figure(dpi=150)
sns.heatmap(df_cm, annot=True, fmt='d', cmap="Blues")

The previous code should produce a heatmap similar to the following one:

A screenshot of a graph

Description automatically generated

Figure 3.9: Confusion matrix obtained with the test dataset

The confusion matrix shows that the samples are mainly distributed in the leading diagonal, and there are more FPs than FNs. Therefore, although the network is suitable for detecting snow, we should expect some false detections.

Step 4:

Calculate the recall, precision, and F-score performance metrics:

TN = cm[0][0]
TP = cm[1][1]
FN = cm[1][0]
FP = cm[0][1]
accur  = (TP + TN) / (TP + TN + FN + FP)
precis = TP / (TP + FP)
recall = TP / (TP + FN)
f_score = (2 * recall * precis) / (recall + precis)
print("Accuracy:  ", round(accur, 3))
print("Recall:    ", round(recall, 3))
print("Precision: ", round(precis, 3))
print("F-score:   ", round(f_score, 3))

The preceding code prints the performance metrics on the output console, resulting in an output similar to what is shown in the following screenshot:

A black text with black text

Description automatically generated with medium confidence

Figure 3.10: Precision, recall, and F-score results

Based on the results reported in the preceding screenshot, which might slightly differ from yours, we can observe that the model has a high recall value of 0.923, indicating that it can accurately predict snowfall. However, the precision value of 0.818 is comparatively lower, meaning the model may produce some false alarms.

The F-score value of 0.867 demonstrates a balance between recall and precision metrics, meaning the model can accurately predict snow instances using the given input features.

There’s more…

In this recipe, we learned how to assess the model’s effectiveness by visualizing the confusion matrix and evaluating the recall, precision, and F-score metrics.

However, scikit-learn is not the only way to compute the confusion matrix. In fact, TensorFlow also provides a tool to calculate the confusion matrix as well. To delve deeper into this topic, we recommend referring to the TensorFlow documentation at the following link: https://www.tensorflow.org/versions/r2.13/api_docs/python/tf/math/confusion_matrix.

After evaluating the model’s effectiveness, the model’s quantization is the only step separating us from the beginning of the model deployment on the microcontroller.

In the upcoming recipe, we will compress the trained model by quantizing it to 8-bit using the TensorFlow Lite converter.