Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

Using ChatGPT with Text to Speech

Save for later
  • 7 min read
  • 04 Jun 2023

article-image

This article provides a quick guide to using the OpenAI API to jump-start ChatGPT. The guide includes instructions on how to use a microphone to speak to ChatGPT and how to create a ChatGPT request with variables. Additionally, the article explains how to use Google gTTS, a text-to-speech tool, to listen to ChatGPT's response. By following these steps, you can have a more interactive experience with ChatGPT and make use of its advanced natural language processing capabilities. We’re using the GPT-3.5-Turbo architecture in this example. We are also running the examples within Google Colab, but they should be applicable to other environments.

 

In this article, we’ll cover:

 

  1. Installing OpenAI, your API key, and Google gTTS for Text-to-Speech
  2. Generating content with ChatGPT
  3. Speech-to-text ChatGPT's response
  4. Transcribing with Whisper


To understand GPT-3 Transformers in detail, read Transformers for NLP, 2nd Edition

 

1. Installing OpenAI, gTTS, and your API Key

 

There are a few libraries that we’ll need to install into Colab for this project. We’ll install them as required, starting with OpenAI.

 

Installing and Importing OpenAI

 

To start using OpenAI's APIs and tools, we'll need to install the OpenAI Python package and import it into your project. To do this, you can use pip, a package manager for Python. First, make sure you have pip installed on your system. 

!pip install --upgrade pip

Next, run the following script in your notebook to install the OpenAI package. It should come pre-installed in Colab:

#Importing openai

try:

import openai

except:

!pip install openai

import openai

 

Installing gTTS

 

Next, install Google gTTS a Python library that provides an easy-to-use interface for text-to-speech synthesis using the Google Text-to-Speech API:

#Importing gTTS

try:

from gtts import gTTS

except:

!pip install gTTS   

from gtts import gTTS

 

API Key

 

Finally, import your API key. Rather than enter your key directly into your notebook, I recommend keeping it in a local file and importing it from your script. You will need to provide the correct path and filename in the code below.

from google.colab import drive

drive.mount('/content/drive')

f = open("drive/MyDrive/files/api_key.txt", "r")

API_KEY=f.readline()

f.close()

#The OpenAI Key

import os

os.environ['OPENAI_API_KEY'] =API_KEY

openai.api_key = os.getenv("OPENAI_API_KEY")

 

2. Generating Content

 

Let’s look at how to pass prompts into the OpenAI API to generate responses.

 

Speech to text

 

When it comes to speech recognition, Windows provides built-in speech-to-text functionality. However, third-party speech-to-text modules are also available, offering features such as multiple language support, speaker identification, and audio transcription. 

For simple speech-to-text, this notebook uses the built-in functionality in Windows. Press Windows key + H to bring up the Windows speech interface. You can read the documentation for more information.

Note: For this notebook, press Enter when you have finished asking for a request in Colab. You could also adapt the function in your application with a timed input function that automatically sends a request after a certain amount of time has elapsed.

 

Preparing the Prompt

 

Note: you can create variables for each part of the OpenAI messages object. This object contains all the information needed to generate a response from ChatGPT, including the text prompt, the model ID, and the API key. By creating variables for each part of the object, you can make it easier to generate requests and responses programmatically. For example, you could create a prompt variable that contains the text prompt for generating a response. You could also create variables for the model ID and API key, making it easier to switch between different OpenAI models or accounts as needed.

For more on implementing each part of the messages object, take a look at: Prompt_Engineering_as_an_alternative_to_fine_tuning.ipynb.

Here’s the code for accepting the prompt and passing the request to OpenAI:

#Speech to text. Use OS speech-to-text app. For example,   Windows: press Windows Key + H 

def prepare_message():

#enter the request with a microphone or type it if you wish  

# example: "Where is Tahiti located?"  

print("Enter a request and press ENTER:")  

uinput = input("")  

#preparing the prompt for OpenAI   

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime

role="user"  

#prompt="Where is Tahiti located?" #maintenance or if you do not want to use a microphone  

line = {"role": role, "content": uinput}  

#creating the message   assert1={"role": "system", "content": "You are a helpful assistant."}  

assert2={"role": "assistant", "content": "Geography is an important topic if you are going on a once in a lifetime trip."}  

assert3=line  

iprompt = []  

iprompt.append(assert1)  

iprompt.append(assert2)  

iprompt.append(assert3)  

return iprompt

#run the cell to start/continue a dialog

iprompt=prepare_message() #preparing the messages for ChatGPT

response=openai.ChatCompletion.create(model="gpt-3.5-turbo",messages=iprompt) #ChatGPT dialog

text=response["choices"][0]["message"]["content"] #response in JSON

print("ChatGPT response:",text)

 

Here's a sample of the output:

 

Enter a request and press ENTER:

Where is Tahiti located

ChatGPT response: Tahiti is located in the South Pacific Ocean, specifically in French Polynesia. It is part of a group of islands called the Society Islands and is located approximately 4,000 kilometers (2,500 miles) south of Hawaii and 7,850 kilometers (4,880 miles) east of Australia.

 

3. Speech-to-text the response

 

GTTS and IPython

 

Once you've generated a response from ChatGPT using the OpenAI package, the next step is to convert the text into speech using gTTS (Google Text-to-Speech) and play it back using  IPython audio.

from gtts import gTTS

from IPython.display import Audio

tts = gTTS(text)

tts.save('1.wav')

sound_file = '1.wav'

Audio(sound_file, autoplay=True)

 

4. Transcribing with Whisper

 

If your project requires the transcription of audio files, you can use OpenAI’s Whisper.

First, we’ll install the ffmpeg audio processing library. ffmpeg is a popular open-source software suite for handling multimedia data, including audio and video files:

!pip install ffmpeg

Next, we’ll install Whisper:

!pip install git+https://github.com/openai/whisper.git 

With that done, we can use a simple command to transcribe the WAV file and store it as a JSON file with the same name:

!whisper  1.wav

You’ll see Whisper transcribe the file in chunks:

[00:00.000 --> 00:06.360]  Tahiti is located in the South Pacific Ocean, specifically in the archipelago of society

[00:06.360 --> 00:09.800]  islands and is part of French Polynesia.

[00:09.800 --> 00:22.360]  It is approximately 4,000 miles, 6,400 km, south of Hawaii and 5,700 miles, 9,200 km,

[00:22.360 --> 00:24.640]  west of Santiago, Chile.

Once that’s done, we can read the JSON file and display the text object:

import json with open('1.json') as f:     data = json.load(f) text = data['text'] print(text)

This gives the following output:

Tahiti is located in the South Pacific Ocean, specifically in the archipelago of society islands and is part of French Polynesia. It is approximately 4,000 miles, 6,400 km, south of Hawaii and 5,700 miles, 9,200 km, west of Santiago, Chile.

 

By using Whisper in combination with ChatGPT and gTTS, you can create a fully featured AI-powered application that enables users to interact with your system using natural language inputs and receive audio responses. This might be useful for applications that involve transcribing meetings, conferences, or other audio files.

 

About the Author

 

Denis Rothman graduated from Sorbonne University and Paris-Diderot University, designing one of the very first word2matrix patented embedding and patented AI conversational agents. He began his career authoring one of the first AI cognitive natural language processing (NLP) chatbots applied as an automated language teacher for Moet et Chandon and other companies. He authored an AI resource optimizer for IBM and apparel producers. He then authored an advanced planning and scheduling (APS) solution used worldwide.

You can follow Denis on LinkedIn:  https://www.linkedin.com/in/denis-rothman-0b034043/

Copyright 2023 Denis Rothman, MIT License