Search icon CANCEL
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

Building Voice Technology on IoT Projects

Save for later
  • 10 min read
  • 08 Nov 2016

article-image

In this article by Agus Kurniawan, authors of Smart Internet of Things Projects, we will explore how to make your IoT board speak something. Various sound and speech modules will be explored as project journey.

(For more resources related to this topic, see here.)

We explore the following topics

  • Introduce a speech technology
  • Introduce sound sensor and actuator
  • Introduce pattern recognition for speech technology
  • Review speech and sound modules
  • Build your own voice commands for IoT projects
  • Make your IoT board speak
  • Make Raspberry Pi speak

Introduce a speech technology

Speech is the primary means of communication among people. A speech technology is a technology which is built by speech recognition research. A machine such as a computer can understand what human said even the machine can recognize each speech model so the machine can differentiate each human's speech.

A speech technology covers speech-to-text and text-to-speech topics. Some researchers already define several speech model for some languages, for instance, English, German, China, French.

A general of speech research topics can be seen in the following figure:

building-voice-technology-iot-projects-img-0

To convert speech to text, we should understand about speech recognition. Otherwise, if we want to generate speech sounds from text, we should learn about speech synthesis. This article doesn't cover about speech recognition and speech synthesis in heavy mathematics and statistics approach. I recommend you read textbook related to those topics.

In this article, we will learn how to work sound and speech processing on IoT platform environment.

Introduce sound sensors and actuators

Sound sources can come from human, animal, car, and etc. To process sound data, we should capture the sound source from physical to digital form. This happens if we use sensor devices which capture the physical sound source. A simple sound sensor is microphone. This sensor can record any source via microphone.

We use a microphone module which is connected to your IoT board, for instance, Arduino and Raspberry Pi. One of them is Electret Microphone Breakout, https://www.sparkfun.com/products/12758. This is a breakout module which exposes three pin outs: AUD, GND, and VCC. You can see it in the following figure.

building-voice-technology-iot-projects-img-1

Furthermore, we can generate sound using an actuator. A simple sound actuator is passive buzzer. This component can generate simple sounds with limited frequency. You can generate sound by sending on signal pin through analog output or PWM pin. Some manufacturers also provide a breakout module for buzzer. Buzzer actuator form is shown in the following figure.

building-voice-technology-iot-projects-img-2

Buzzer usually is passive actuator. If you want to work with active sound actuator, you can use a speaker. This component is easy to find on your local or online store. I also found it on Sparkfun, https://www.sparkfun.com/products/11089 which you can see it in the following figure.

building-voice-technology-iot-projects-img-3

To get experiences how to work sound sensor/actuator, we build a demo to capture sound source by getting sound intensity.

In this demo, I show how to detect a sound intensity level using sound sensor, an Electret microphone. The sound source can come from sounds of voice, claps, door knocks or any sounds loud enough to be picked up by a sensor device. The output of sensor device is analog value so MCU should convert it via a microcontroller's analog-to-digital converter.

The following is a list of peripheral for our demo.

You can also use Adafruit Electret Microphone Breakout to be attached into Arduino board. You can review it on https://www.adafruit.com/product/1063.

To build our demo, you wire those components as follows

  • Connect Electret Microphone AUD pin to Arduino A0 pin
  • Connect Electret Microphone GND pin to Arduino GND pin
  • Connect Electret Microphone VCC pin to Arduino 3.3V pin
  • Connect 10 Segment LED Bar Graph pins to Arduino digital pins: 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 which already connected to resistor 330 Ohm

You can see the final wiring of our demo in the following figure:

building-voice-technology-iot-projects-img-4

10 segment led bar graph module is used to represent of sound intensity level. In Arduino we can use analogRead() to read analog input from external sensor. Output of analogRead() returns value 0 - 1023.

Total output in voltage is 3.3V because we connect Electret Microphone Breakout with 3.3V on VCC. From this situation, we can set 3.3/10 = 0.33 voltage for each segment led bar. The first segment led bar is connected to Arduino digital pin 3.

Now we can implement to build our sketch program to read sound intensity and then convert measurement value into 10 segment led bar graph.

To obtain a sound intensity, we try to read sound input from analog input pin. We read it during a certain time, called sample window time, for instance, 250 ms. During that time, we should get the peak value or maximum value of analog input. The peak value will be set as sound intensity value.

Let's start to implement our program. Open Arduino IDE and write the following sketch program.

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at AU $24.99/month. Cancel anytime
// Sample window width in mS (250 mS = 4Hz)
const int sampleWindow = 250; 
unsigned int sound;
int led = 13;

void setup() 
{
   Serial.begin(9600);
   pinMode(led, OUTPUT);

   pinMode(3, OUTPUT);
   pinMode(4, OUTPUT);
   pinMode(5, OUTPUT);
   pinMode(6, OUTPUT);
   pinMode(7, OUTPUT);
   pinMode(8, OUTPUT);
   pinMode(9, OUTPUT);
   pinMode(10, OUTPUT);
   pinMode(11, OUTPUT);
   pinMode(12, OUTPUT);
}

void loop() 
{
  unsigned long start= millis(); 
  unsigned int peakToPeak = 0;  
  
  unsigned int signalMax = 0;
  unsigned int signalMin = 1024;
  
  // collect data for 250 milliseconds
  while (millis() - start < sampleWindow)
  {    
    sound = analogRead(0);
    if (sound < 1024)  
    {
      if (sound > signalMax)
      {        
        signalMax = sound;  
      }
      else if (sound < signalMin)
      {
        signalMin = sound;  
      }
   }
  }
  peakToPeak = signalMax - signalMin; 
  double volts = (peakToPeak * 3.3) / 1024; 
   
  Serial.println(volts);
  display_bar_led(volts); 
}

void display_bar_led(double volts) 
{
  display_bar_led_off();

  int index = round(volts/0.33);
  switch(index){
    case 1:    
      digitalWrite(3, HIGH);
      break;
    case 2:
      digitalWrite(3, HIGH);
      digitalWrite(3, HIGH);
      break;
    case 3:
      digitalWrite(3, HIGH);
      digitalWrite(4, HIGH);
      digitalWrite(5, HIGH);      
      break;
    case 4:
      digitalWrite(3, HIGH);
      digitalWrite(4, HIGH);
      digitalWrite(5, HIGH);
      digitalWrite(6, HIGH);        
      break;
    case 5:
      digitalWrite(3, HIGH);
      digitalWrite(4, HIGH);
      digitalWrite(5, HIGH);
      digitalWrite(6, HIGH);  
      digitalWrite(7, HIGH);      
      break;
    case 6:
      digitalWrite(3, HIGH);
      digitalWrite(4, HIGH);
      digitalWrite(5, HIGH);
      digitalWrite(6, HIGH);  
      digitalWrite(7, HIGH);
      digitalWrite(8, HIGH);      
      break;
    case 7:
      digitalWrite(3, HIGH);
      digitalWrite(4, HIGH);
      digitalWrite(5, HIGH);
      digitalWrite(6, HIGH);  
      digitalWrite(7, HIGH);
      digitalWrite(8, HIGH);
      digitalWrite(9, HIGH);      
      break;
    case 8:
      digitalWrite(3, HIGH);
      digitalWrite(4, HIGH);
      digitalWrite(5, HIGH);
      digitalWrite(6, HIGH);  
      digitalWrite(7, HIGH);
      digitalWrite(8, HIGH);
      digitalWrite(9, HIGH);
      digitalWrite(10, HIGH);      
      break;
    case 9:
      digitalWrite(3, HIGH);
      digitalWrite(4, HIGH);
      digitalWrite(5, HIGH);
      digitalWrite(6, HIGH);  
      digitalWrite(7, HIGH);
      digitalWrite(8, HIGH);
      digitalWrite(9, HIGH);
      digitalWrite(10, HIGH);
      digitalWrite(11, HIGH);      
      break;
    case 10:
      digitalWrite(3, HIGH);
      digitalWrite(4, HIGH);
      digitalWrite(5, HIGH);
      digitalWrite(6, HIGH);  
      digitalWrite(7, HIGH);
      digitalWrite(8, HIGH);
      digitalWrite(9, HIGH);
      digitalWrite(10, HIGH);
      digitalWrite(11, HIGH);
      digitalWrite(12, HIGH);
      break;   
  }
  
}

void display_bar_led_off()
{
  digitalWrite(3, LOW);
  digitalWrite(4, LOW);
  digitalWrite(5, LOW);
  digitalWrite(6, LOW);  
  digitalWrite(7, LOW);
  digitalWrite(8, LOW);
  digitalWrite(9, LOW);
  digitalWrite(10, LOW);
  digitalWrite(11, LOW);
  digitalWrite(12, LOW);
}

Save this sketch program as ch05_01.

Compile and deploy this program into Arduino board.

After deployed the program, you can open Serial Plotter tool. You can find this tool from Arduino menu Tools -| Serial Plotter. Set the baud rate as 9600 baud on the Serial Plotter tool.

Try to make noise on a sound sensor device. You can see changing values on graphs from Serial Plotter tool. A sample of Serial Plotter can be seen in the following figure:

building-voice-technology-iot-projects-img-5

How to work?

The idea to obtain a sound intensity is easy. We get a value among sound signal peaks. Firstly, we define a sample width, for instance, 250 ms for 4Hz.

// Sample window width in mS (250 mS = 4Hz) const int sampleWindow = 250;  unsigned int sound; 
int led = 13;

On the setup() function, we initialize serial port and our 10 segment led bar graph.

void setup() 
{
   Serial.begin(9600);
   pinMode(led, OUTPUT);
    pinMode(3, OUTPUT);    pinMode(4, OUTPUT);    pinMode(5, OUTPUT);    pinMode(6, OUTPUT);    pinMode(7, OUTPUT);    pinMode(8, OUTPUT);    pinMode(9, OUTPUT);    pinMode(10, OUTPUT);    pinMode(11, OUTPUT);    pinMode(12, OUTPUT); 
}

On the loop() function, we perform to calculate a sound intensity related to a sample width. After obtained a peak-to-peak value, we convert it into voltage form.

unsigned long start= millis(); 
  unsigned int peakToPeak = 0;  
  
  unsigned int signalMax = 0;
  unsigned int signalMin = 1024;
  
  // collect data for 250 milliseconds
  while (millis() - start < sampleWindow)
  {    
    sound = analogRead(0);
    if (sound < 1024)  
    {
      if (sound > signalMax)       {        signalMax = sound;       }       else if (sound < signalMin)       {         signalMin = sound;       }    }   }   peakToPeak = signalMax - signalMin; 
  double volts = (peakToPeak * 3.3) / 1024;

Then, we show a sound intensity in volt form in serial port and 10 segment led by calling display_bar_led().

Serial.println(volts); 
display_bar_led(volts);

Inside the display_bar_led() function, we turn off all LEDs on 10 segment led bar graph by calling display_bar_led_off() which sends LOW on all LEDs using digitalWrite(). After that, we calculate a range value from volts. This value will be converted as total showing LEDs.

display_bar_led_off();
int index = round(volts/0.33);

Introduce pattern recognition for speech technology

Pattern recognition is one of topic in machine learning and as baseline for speech recognition. In general, we can construct speech recognition system in the following figure:

building-voice-technology-iot-projects-img-6

From human speech, we should convert it into digital form, called discrete data. Some signal processing methods are applied to handle pre-processing such as removing noise from data.

Now in pattern recognition we do perform speech recognition method. Researchers did some approaches such as computing using Hidden Markov Model (HMM) to identity sound related to word. Performing feature extraction in speech digital data is a part of pattern recognition activities. The output will be used as input in pattern recognition input.

The output of pattern recognition can be applied as Speech-to-Text and Speech command on our IoT projects.

Reviewing speech and sound modules for IoT devices

In this section, we review various speech and sound modules which can be integrated into our MCU board. There are a lot of modules related to speech and sound processing. Each module has unique features which fits with your works.

One of speech and sound modules is EasyVR 3 & EasyVR Shield 3 from VeeaR. You can review this module on http://www.veear.eu/introducing-easyvr-3-easyvr-shield-3/. Several languages already have been supported such as English (US), Italian, German, French, Spanish, and Japanese.

You can see EasyVR 3 module in the following figure:

building-voice-technology-iot-projects-img-7

EasyVR 3 board also is available as a shield for Arduino. If you buy an EasyVR Shield 3, you will obtain EasyVR board and its Arduino shield. You can see the form of EasyVR Shield 3 on the following figure:

building-voice-technology-iot-projects-img-8

The second module is Emic 2. It was designed by Parallax in conjunction with Grand Idea Studio, http:// www.grandideastudio.com/, to make voice synthesis a total no-brainer. You can send texts to the module to generate human speech through serial protocol. This module is useful if you want to make boards speak. Further information about this module, you can visit and buy this module on https://www.parallax.com/product/30016. The following is a form of Emic-2 module:

building-voice-technology-iot-projects-img-9

Summary

We have learned some basic sound and voice processing. We also explore several sound and speech modules to integrate into your IoT project. We built a program to read sound intensity level at the first.

Resources for Article:


Further resources on this subject: