Getting Started with Azure Speech Service

Introduction

Commanding machines to your bidding was once sci-fi. Being able to command a machine to do something with mere words graced the pages of many sci-fi comics and novels. It wasn’t until recently that science fiction became science fact. With the rise of devices such as Amazon’s Alexa and Apple’s Siri, being able to vocally control a device has become a staple of the 21^st century.

So, how does one integrate voice control in an app? There are many ways to accomplish that. However, one of the easiest ways is to use an Azure AI tool called Speech Service. This tutorial is going to be a crash course on how to integrate Azure’s Speech Service into a standard C# app. To explore this AI tool, we’re going to use it to create a simple profanity filter to demonstrate the Speech Service.

What is Azure Speech Service?

There are many ways to create a speech-to-text app. One could create one from scratch, use a library, or use a cloud service. Arguably the easiest way to create a speech-to-text app is with a cloud service such as the Azure speech service. This service is an Azure AI tool that will analyze speech that is picked up by a microphone and converts it to a text string in the cloud. The resulting string will then be sent back to the app that made the request. In other words, the Speech-to-Text service that Azure offers is an AI developer tool that allows engineers to quickly convert speech to a text string.

It is important to understand the Speech Service is a developer’s tool. Since the rise of systems like ChatGPT what is considered an AI tool has been ambiguous at best. When one thinks of modern AI tools they think of tools where you can provide a prompt and get a response. However, when a developer thinks of a tool, they usually think of a tool that can help them get a job done quickly and efficiently. As such, the Azure Speech Service is an AI tool that can help developers integrate speech-to-text features into their applications with minimal setup.

The Azuer Speech service is a very powerful tool that can be integrated into almost anything. For example, you can create a profanity filter with minimal code, make a voice request to LLM like ChatGPT or do any number of things. Now, it is important to remember that Azure Speech Service is an AI tool that is meant for engineers. Unlike tools like ChatGPT or LLMs in general, you will have to understand the basics of code to use it successfully. With that, what do you need to get started with the Speech Service?

What do you need to build to use Azure Speech Service?

Setting up an app that can utilize the Azure service is relatively minimal. All you will need is the following:

An Azure account.
Visual Studios (preferably the latest version)
Internet connectivity
Microsoft.CognitiveServices.Speech Nuget package

This project is going to be a console-based application, so you won’t need to worry about anything fancy like creating a Graphical User Interface (GUI). When all that is installed and ready to go the next thing you will want to do is set up a simple speech-to-text service in Azure.

Setup Azure Speech Service

After you have your environment set up, you’re going to want to set up your service. Setting up the Speech-to-Text service is quick and easy as there is very little that needs to be done on the Azure side. All one has to do is set the service up in perform the following steps,

1. Login into Azure and search for Speech Services.

2. Click the Create button in Figure 1 and fill out the wizard that appears:

getting-started-with-azure-speech-service-img-0

Figure 1. Create Button

3. Fill out the wizard to match Figure 2. You can name the instance anything you want and set the resource group to anything you want. As far as the pricing tier goes, you will usually be able to use the service for free for a time. However, after the trial period ends you will eventually have to pay for the service. Regardless, once you have the wizard filled out click Review + Create:

getting-started-with-azure-speech-service-img-1

Figure 2. Speech Service

4. Keep following the wizard until you see the screen in Figure 3. On this screen, you will want to click the manager key link that is circled in red:

getting-started-with-azure-speech-service-img-2

Figure 3. Instance Service

This is where you get the keys necessary to use the AI tool. Clicking the link is not totally necessary as the keys are at the bottom of the page. However, clicking the link is sometimes easier as it’ll bring you directly to the keys.

At this point, the service is set up. You will need to capture the key info which can be viewed in Figure 4:

getting-started-with-azure-speech-service-img-3

Figure 4. Key Information

You will need to capture the key data. You can do this by simply clicking the Show Keys button which will unmask KEY 1 and KEY 2. Each instance you create will generate a new set of keys. As a safety note, you should never share your keys with anyone as they’ll be able to use your service which in turn means they will rack up your bill among other cyber-security concerns. As such, you will want to unmask the keys and grab KEY 1 and copy the region as well.

C# Code

Now, comes the fun part of the project, creating the app. The app will be relatively simple. The only hard part will be installing the NuGet package for the speech service. To do this simply add the NuGet package found in Figure 5.

getting-started-with-azure-speech-service-img-4

Figure 5. NuGet Package

Once that package is installed you can now start to implement the code.

To start off, we’re simply going to make an app that can dictate back what we say to it. To do this input the following code:

// See https://aka.ms/new-console-template for more information
using Microsoft.CognitiveServices.Speech;
 
 
await translateSpeech();
 
static async Task translateSpeech()
{
    string key = "<Your Key>";
    string region = "<Your Region";
    var config = SpeechConfig.FromSubscription(key, region);
    using (var recognizer = new SpeechRecognizer(config))
    {
        var result = await recognizer.RecognizeOnceAsync();
        Console.WriteLine(result.Text);
    }
 
}
}

When you run this program it will open up a prompt. You will be able to speak into the computer mic and whatever you say will be displayed. For example, run the program and say “Hello World”. After the service is finished translating your speech you should see the following display on the command prompt:

getting-started-with-azure-speech-service-img-5

Figure 6. Output From App

Now, this isn’t the full project. This is just a simple app that will dictate what we say to the computer. What we’re aiming for in this tutorial is a simple profanity filter. For that, we need to add another function to the project to help filter the returned string.

It is important to remember that what is returned is a text string. The text string is just like any other text string that one would use in C#. As such, we can modify the program to the following to filter profanity:

// See https://aka.ms/new-console-template for more information
using Microsoft.CognitiveServices.Speech;
 
 
await translateSpeech();
 
static async Task translateSpeech()
{
    string key = "<Your Key>";
    string region = "<Your Region>";
    var config = SpeechConfig.FromSubscription(key, region);
    using (var recognizer = new SpeechRecognizer(config))
    {
        var result = await recognizer.RecognizeOnceAsync();
        Console.WriteLine(result.Text);
        VetSpeech(result.Text);
    }
 
}
 
 
static void VetSpeech(String input)
{
    Console.WriteLine("checking phrase: " + input);
    String[] badWords = { "Crap", "crap", "Dang", "dang", "Shoot", "shoot" };
 
    foreach(String word in badWords)
    {
        if (input.Contains(word))
        {
            Console.WriteLine("flagged");
        }
    }
   
}

Now, in the VetSpeech function, we have an array of “bad” words. In short, if the returned string contains a variation of these words the program will display “flagged”. As such, if we were to say “Crap Computer” when the program is run we can expect to see the following output in the prompt:

getting-started-with-azure-speech-service-img-6

Figure 7. Profanity Output

As can be seen, the program flagged the phrase because the word Crap was in it.

Exercises

This tutorial was a basic rundown of the Speech Service in Azure. This is probably one of the simplest services to use but it is still very powerful. Now, that you have a basic idea of how the service works and how to write C# code for it. Create a ChatGPT developer token and take the returned string and pass it to ChatGPT. When done correctly, this project will allow you to verbally interact with ChatGPT. That is you should be able to verbally ask ChatGPT a question and get a response.

Conclusion

The Azure Speech Service is an AI tool. Unlike many other AI tools like ChatGPT and the like, this tool is meant for developers to build applications with. Also, unlike many other Azure services, this is a very easy-to-use system with a minimal setup. As can be seen from the tutorial the hardest part was writing the code that utilized the service, and even still that was not that difficult. The best part is that the code provided in this tutorial is the basic code you will need to interact with the service meaning that all you have to do now, is modify it to fit your project’s needs.

Overall, the power of the Speech Service is limited to your imagination. This tool would be excellent for integrating verbal interaction with other tools like ChatGPT, creating voice-controlled robots, or anything else. Overall, this is a relatively cheap and powerful tool that can be leveraged for many things.

Author Bio

M.T. White has been programming since the age of 12. His fascination with robotics flourished when he was a child programming microcontrollers such as Arduino. M.T. currently holds an undergraduate degree in mathematics, and a master's degree in software engineering, and is currently working on an MBA in IT project management. M.T. is currently working as a software developer for a major US defense contractor and is an adjunct CIS instructor at ECPI University. His background mostly stems from the automation industry where he programmed PLCs and HMIs for many different types of applications. M.T. has programmed many different brands of PLCs over the years and has developed HMIs using many different tools.

Author of the book: Mastering PLC Programming