Making your BeagleBone Black speak
Now that you can get sounds both in and out of your BeagleBone Black, let's start doing something useful with this capability. Start by enabling Espeak, an open source application that provides you with a computer voice with a bit of personality. To get this functionality, download the Espeak library by typing sudo apt-get install espeak
. You'll probably have to accept the additional size that the application requires, but this is fine based on your SD card size. This might take a bit of time to download, but the prompt will reappear when it is done.
Now let's see if your BeagleBone Black has a voice. Type the sudo espeak "hello"
command. The speaker should emit a computer-voiced "hello." If it does not, make sure that the speaker is on and its volume is high enough to be heard. Now that you have a computer voice, you can customize it. Espeak offers a fairly complete set of customization features, including a large number of languages, voices, and other options.
Now your project can speak. Simply type espeak
, followed by the text you want it to speak in quotes, and out comes your speech! If you want to read an entire text file, you can do that as well, using the –f
option and then typing the name of the file. Try this by using your editor to create a text file called speak
, then type this command: sudo espeak -f speak.txt
.
Installing speech recognition
Now that your projects can speak, you will want them to listen as well. This isn't nearly as simple as the speaking part, but thankfully you have some significant help. You will download a set of capabilities called pocketsphinx, and using these capabilities, you will provide your project with the ability to listen to your commands.
The first step is to download the pocketsphinx capability. Unfortunately, this is not as user friendly as the Espeak process, so follow the steps carefully. First, go to the Sphinx website, hosted by Carnegie Mellon University at http://cmusphinx.sourceforge.net/. This is an open source project that provides you with the speech recognition software you will need. With your smaller embedded system, you will be using the pocketsphinx version of this code. You will need to download two pieces of software, sphinxbase and pocketsphinx. Download these by selecting the Download section at the top of the page, and then find the latest version of both the packages. Download the .tar.gz
versions of these and move them to the /usr/ubuntu
directory of your BeagleBone Black. However, before you build these, you'll need another library.
This library is called bison. It's a general purpose, open source parser that will be used by pocketsphinx. To get this package, type sudo apt-get install bison
.
If everything explained so far is installed and downloaded, you can build pocketsphinx as follows:
- Start by unpacking and building the sphinxbase. Type
tar –xzvf sphinx-base-0.x.tar.gz
wherex
is the version number. This should unpack all the files from your archive into a directory calledsphinxbase-0.x
. Now change to that directory. - Now you will build the application. Start by issuing the
./configure --enable-fixed
command. This will first check to make sure everything is ok with the system, then configure a build. When I first attempted this command, I got the following error: - This highlighted an interesting problem. The time and date on my BeagleBone Black was not set to the current time and date. If you need to set the current date and time, do that by issuing the
sudo date nnddhhmmyyyy.ss
command wherenn
is the month,dd
is the day,hh
is the hour,mm
are the minutes,yyyy
is the year, andss
is the second. This will set the date to the desired date. Now you can reissue the./configure --enable-fixed
command. - You can also install python-dev using
sudo apt-get install python-dev
and Cython usingsudo apt-get install cython
. Both of these will be useful later if you are going to use your pocketsphinx capability with Python as a coding language. You can also choose to installpkg-config
, a utility that can sometimes help when you are trying to do complex compilations. Install it usingsudo apt-get install pkg-config
.
Now you are ready to actually build the sphinxbase code base. This is a two-step process. First type make
, and the system will build all the executable files. Then type sudo make install
and it will install all the executables on the system.
Now make the second part of the system, the pocketsphinx code itself, as follows:
- Go to the home directory and unarchive the code by typing
tar -xzvf pocketsphinx-0.x.tar.gz
, wherex
is the version number of pocketsphinx. The files should now be unarchived, and you can now build the code. Follow similar steps for these files, firstcd
to thepocketSphinx
directory, then type./configure
to see if you're ready to build the files. Then typemake
, wait for everything to build, then typesudo make install
. - Once you have completed the installation, you need to let the system know where your files are. To do this, edit the
/etc/ld.so.conf
file as root. Add the last line to the file, so it should now look like this: - Type
sudo /sbin/ldconfig
and the system will now be aware of your pocketsphinx libraries. - Once everything is installed, you can try your speech recognition. Change your directory to the /
home/ubuntu//pocketsphinx-0.8/src/programs
directory and try a demo program by typingsudo ./pocketsphinx_continuous
. This program takes an input from the mic and turns it into a speech. After running the command, you'll get all kinds of information that won't have much meaning for you, and then get to this point: - Even though the warning message states that it can't find a mic or a capture element, it can find your mic element or a capture element. If you have set things up as previously described, you should be ready to give it a command. Say "hello" into the mic. When it senses that you have stopped speaking, it will process your speech, again giving us all kinds of interesting information that has no meaning for us, but should eventually showing this screen:
Notice the 000000001: hello
line. It recognized your speech! You can try other words and phrases. The system is very sensitive, so it might also pick up background noise. You are also going to find out that it is not very accurate. There are two ways to make it more accurate. One is to train the system to understand your voice more accurately. I'm not going to detail that process here. It's a bit complex, and if you want to know more, feel free to go to the CMU pocketsphinx website at http://cmusphinx.sourceforge.net/.
Improving speech recognition accuracy
The second way to improve accuracy is to limit the number of words that your system can use to determine what you are saying. The default has literally thousands of words that are possible, so if two words are close, it might choose the wrong word as opposed to the word you spoke. In order to make the system more accurate, you are going to restrict the words it has to choose from. You can do this by making your own grammar.
The first step is to create a file with the words or phrases you want the system to recognize. Then you use a web tool to create two files that the system will use to define your grammar:
- Create a file called
grammar.txt
and insert the following text in it: - Now you must use the CMU web browser tool to turn this file into two files that the system can use to define its dictionary. Open a web browser window and go to www.speech.cs.cmu.edu/tools/lmtool-new.html. If you click on the Choose File button, you can then find and select your file. It should look something like this:
- Open the
grammer.txt
file and on the web page, select COMPILE KNOWLEDGE BASE. The following window should pop up: - Now you need to download the
.tgz
file, that is, the tool created. In this case, it's theTAR1565.tgz
file. - Move it to the
/home/ubuntu/pocketsphinx-0.8/src/programs
directory and unarchive it usingtar –xzvf
and the filename. - Now you can invoke the
pocketsphinx_continuous
program to use this dictionary by typingsudo ./pocketsphinx_continuous -lm 1565.lm -dict 1565.dic
.
It will now look up that directory as it tries to find matches to your commands.
Responding to voice commands
Now that your system can both hear and speak, you would want to provide the capability to respond to your speech, and perhaps even execute some commands based on the speech input. Now you're going to configure the system to respond to your simple commands.
In order to respond, you're going to edit the continuous.c
code in the /home/ubuntu/pocketsphinx-0.8/src/programs
directory. You can create your own .c
file, but this file is already set up in the makefile system, and will serve as an excellent starting spot. You will need to edit the continuous.c
file. It's very long, and a bit complicated, but you should be specifically looking out for the following section in the code:
In this section of the code, the word has already been decoded, and is held in the hyp
variable. You can add some code here to make your system do things based on the value associated with the word you have decoded. First, let's try adding the capability to respond to hello and goodbye, and see if you can get the program to stop. Make the following changes to the code:
Now you need to rebuild your code. Since the make system already knows how to build the pocketsphinx_continuous
program, any time you make a change to the continuous.c
file, it will rebuild the application. Simply type make
. The file will compile and create a new version of pocketsphinx_continuous
. To run your new version, type sudo ./pocketsphinx_continuous
. Make sure you type ./
at the start of pocketsphinx_continuous
. If you don't, the system has another version of pocketsphinx_continuous
in the library and it will run that.
If everything is set correctly, saying hello should result in a response of hello from your BeagleBone Black. Saying goodbye should elicit a response of goodbye, as well as shutting down the program. Note that the system command can be used to actually run any program that you might run with a command line. You can now use this to have your program started and run other programs based on the voice commands.