Using ChatGPT with Text to Speech

This article provides a quick guide to using the OpenAI API to jump-start ChatGPT. The guide includes instructions on how to use a microphone to speak to ChatGPT and how to create a ChatGPT request with variables. Additionally, the article explains how to use Google gTTS, a text-to-speech tool, to listen to ChatGPT's response. By following these steps, you can have a more interactive experience with ChatGPT and make use of its advanced natural language processing capabilities. We’re using the GPT-3.5-Turbo architecture in this example. We are also running the examples within Google Colab, but they should be applicable to other environments.

In this article, we’ll cover:

Installing OpenAI, your API key, and Google gTTS for Text-to-Speech
Generating content with ChatGPT
Speech-to-text ChatGPT's response
Transcribing with Whisper

To understand GPT-3 Transformers in detail, read Transformers for NLP, 2nd Edition

1. Installing OpenAI, gTTS, and your API Key

There are a few libraries that we’ll need to install into Colab for this project. We’ll install them as required, starting with OpenAI.

Installing and Importing OpenAI

To start using OpenAI's APIs and tools, we'll need to install the OpenAI Python package and import it into your project. To do this, you can use pip, a package manager for Python. First, make sure you have pip installed on your system.

!pip install --upgrade pip

Next, run the following script in your notebook to install the OpenAI package. It should come pre-installed in Colab:

#Importing openai

try:

import openai

except:

!pip install openai

import openai

Installing gTTS

Next, install Google gTTS a Python library that provides an easy-to-use interface for text-to-speech synthesis using the Google Text-to-Speech API:

#Importing gTTS

try:

from gtts import gTTS

except:

!pip install gTTS

from gtts import gTTS

API Key

Finally, import your API key. Rather than enter your key directly into your notebook, I recommend keeping it in a local file and importing it from your script. You will need to provide the correct path and filename in the code below.

from google.colab import drive

drive.mount('/content/drive')

f = open("drive/MyDrive/files/api_key.txt", "r")

API_KEY=f.readline()

f.close()

#The OpenAI Key

import os

os.environ['OPENAI_API_KEY'] =API_KEY

openai.api_key = os.getenv("OPENAI_API_KEY")

2. Generating Content

Let’s look at how to pass prompts into the OpenAI API to generate responses.

Speech to text

When it comes to speech recognition, Windows provides built-in speech-to-text functionality. However, third-party speech-to-text modules are also available, offering features such as multiple language support, speaker identification, and audio transcription.

For simple speech-to-text, this notebook uses the built-in functionality in Windows. Press Windows key + H to bring up the Windows speech interface. You can read the documentation for more information.

Note: For this notebook, press Enter when you have finished asking for a request in Colab. You could also adapt the function in your application with a timed input function that automatically sends a request after a certain amount of time has elapsed.

Preparing the Prompt

Note: you can create variables for each part of the OpenAI messages object. This object contains all the information needed to generate a response from ChatGPT, including the text prompt, the model ID, and the API key. By creating variables for each part of the object, you can make it easier to generate requests and responses programmatically. For example, you could create a prompt variable that contains the text prompt for generating a response. You could also create variables for the model ID and API key, making it easier to switch between different OpenAI models or accounts as needed.

For more on implementing each part of the messages object, take a look at: Prompt_Engineering_as_an_alternative_to_fine_tuning.ipynb.

Here’s the code for accepting the prompt and passing the request to OpenAI:

#Speech to text. Use OS speech-to-text app. For example, Windows: press Windows Key + H

def prepare_message():

#enter the request with a microphone or type it if you wish

# example: "Where is Tahiti located?"

print("Enter a request and press ENTER:")

uinput = input("")

#preparing the prompt for OpenAI

role="user"

#prompt="Where is Tahiti located?" #maintenance or if you do not want to use a microphone

line = {"role": role, "content": uinput}

#creating the message assert1={"role": "system", "content": "You are a helpful assistant."}

assert2={"role": "assistant", "content": "Geography is an important topic if you are going on a once in a lifetime trip."}

assert3=line

iprompt = []

iprompt.append(assert1)

iprompt.append(assert2)

iprompt.append(assert3)

return iprompt

#run the cell to start/continue a dialog

iprompt=prepare_message() #preparing the messages for ChatGPT

response=openai.ChatCompletion.create(model="gpt-3.5-turbo",messages=iprompt) #ChatGPT dialog

text=response["choices"][0]["message"]["content"] #response in JSON

print("ChatGPT response:",text)

Here's a sample of the output:

Enter a request and press ENTER:

Where is Tahiti located

ChatGPT response: Tahiti is located in the South Pacific Ocean, specifically in French Polynesia. It is part of a group of islands called the Society Islands and is located approximately 4,000 kilometers (2,500 miles) south of Hawaii and 7,850 kilometers (4,880 miles) east of Australia.

3. Speech-to-text the response

GTTS and IPython

Once you've generated a response from ChatGPT using the OpenAI package, the next step is to convert the text into speech using gTTS (Google Text-to-Speech) and play it back using IPython audio.

from gtts import gTTS

from IPython.display import Audio

tts = gTTS(text)

tts.save('1.wav')

sound_file = '1.wav'

Audio(sound_file, autoplay=True)

4. Transcribing with Whisper

If your project requires the transcription of audio files, you can use OpenAI’s Whisper.

First, we’ll install the ffmpeg audio processing library. ffmpeg is a popular open-source software suite for handling multimedia data, including audio and video files:

!pip install ffmpeg

Next, we’ll install Whisper:

!pip install git+https://github.com/openai/whisper.git

With that done, we can use a simple command to transcribe the WAV file and store it as a JSON file with the same name:

!whisper 1.wav

You’ll see Whisper transcribe the file in chunks:

[00:00.000 --> 00:06.360] Tahiti is located in the South Pacific Ocean, specifically in the archipelago of society

[00:06.360 --> 00:09.800] islands and is part of French Polynesia.

[00:09.800 --> 00:22.360] It is approximately 4,000 miles, 6,400 km, south of Hawaii and 5,700 miles, 9,200 km,

[00:22.360 --> 00:24.640] west of Santiago, Chile.

Once that’s done, we can read the JSON file and display the text object:

import json with open('1.json') as f: data = json.load(f) text = data['text'] print(text)

This gives the following output:

Tahiti is located in the South Pacific Ocean, specifically in the archipelago of society islands and is part of French Polynesia. It is approximately 4,000 miles, 6,400 km, south of Hawaii and 5,700 miles, 9,200 km, west of Santiago, Chile.

By using Whisper in combination with ChatGPT and gTTS, you can create a fully featured AI-powered application that enables users to interact with your system using natural language inputs and receive audio responses. This might be useful for applications that involve transcribing meetings, conferences, or other audio files.

About the Author

Denis Rothman graduated from Sorbonne University and Paris-Diderot University, designing one of the very first word2matrix patented embedding and patented AI conversational agents. He began his career authoring one of the first AI cognitive natural language processing (NLP) chatbots applied as an automated language teacher for Moet et Chandon and other companies. He authored an AI resource optimizer for IBM and apparel producers. He then authored an advanced planning and scheduling (APS) solution used worldwide.

You can follow Denis on LinkedIn: https://www.linkedin.com/in/denis-rothman-0b034043/