This article provides a quick guide to using the OpenAI API to jump-start ChatGPT. The guide includes instructions on how to use a microphone to speak to ChatGPT and how to create a ChatGPT request with variables. Additionally, the article explains how to use Google gTTS, a text-to-speech tool, to listen to ChatGPT's response. By following these steps, you can have a more interactive experience with ChatGPT and make use of its advanced natural language processing capabilities. We’re using the GPT-3.5-Turbo architecture in this example. We are also running the examples within Google Colab, but they should be applicable to other environments.
In this article, we’ll cover:
To understand GPT-3 Transformers in detail, read Transformers for NLP, 2nd Edition
There are a few libraries that we’ll need to install into Colab for this project. We’ll install them as required, starting with OpenAI.
To start using OpenAI's APIs and tools, we'll need to install the OpenAI Python package and import it into your project. To do this, you can use pip, a package manager for Python. First, make sure you have pip installed on your system.
!pip install --upgrade pip
Next, run the following script in your notebook to install the OpenAI package. It should come pre-installed in Colab:
#Importing openai
try:
import openai
except:
!pip install openai
import openai
Next, install Google gTTS a Python library that provides an easy-to-use interface for text-to-speech synthesis using the Google Text-to-Speech API:
#Importing gTTS
try:
from gtts import gTTS
except:
!pip install gTTS
from gtts import gTTS
Finally, import your API key. Rather than enter your key directly into your notebook, I recommend keeping it in a local file and importing it from your script. You will need to provide the correct path and filename in the code below.
from google.colab import drive
drive.mount('/content/drive')
f = open("drive/MyDrive/files/api_key.txt", "r")
API_KEY=f.readline()
f.close()
#The OpenAI Key
import os
os.environ['OPENAI_API_KEY'] =API_KEY
openai.api_key = os.getenv("OPENAI_API_KEY")
Let’s look at how to pass prompts into the OpenAI API to generate responses.
When it comes to speech recognition, Windows provides built-in speech-to-text functionality. However, third-party speech-to-text modules are also available, offering features such as multiple language support, speaker identification, and audio transcription.
For simple speech-to-text, this notebook uses the built-in functionality in Windows. Press Windows key + H to bring up the Windows speech interface. You can read the documentation for more information.
Note: For this notebook, press Enter when you have finished asking for a request in Colab. You could also adapt the function in your application with a timed input function that automatically sends a request after a certain amount of time has elapsed.
Note: you can create variables for each part of the OpenAI messages object. This object contains all the information needed to generate a response from ChatGPT, including the text prompt, the model ID, and the API key. By creating variables for each part of the object, you can make it easier to generate requests and responses programmatically. For example, you could create a prompt variable that contains the text prompt for generating a response. You could also create variables for the model ID and API key, making it easier to switch between different OpenAI models or accounts as needed.
For more on implementing each part of the messages object, take a look at: Prompt_Engineering_as_an_alternative_to_fine_tuning.ipynb.
Here’s the code for accepting the prompt and passing the request to OpenAI:
#Speech to text. Use OS speech-to-text app. For example, Windows: press Windows Key + H
def prepare_message():
#enter the request with a microphone or type it if you wish
# example: "Where is Tahiti located?"
print("Enter a request and press ENTER:")
uinput = input("")
#preparing the prompt for OpenAI
role="user"
#prompt="Where is Tahiti located?" #maintenance or if you do not want to use a microphone
line = {"role": role, "content": uinput}
#creating the message assert1={"role": "system", "content": "You are a helpful assistant."}
assert2={"role": "assistant", "content": "Geography is an important topic if you are going on a once in a lifetime trip."}
assert3=line
iprompt = []
iprompt.append(assert1)
iprompt.append(assert2)
iprompt.append(assert3)
return iprompt
#run the cell to start/continue a dialog
iprompt=prepare_message() #preparing the messages for ChatGPT
response=openai.ChatCompletion.create(model="gpt-3.5-turbo",messages=iprompt) #ChatGPT dialog
text=response["choices"][0]["message"]["content"] #response in JSON
print("ChatGPT response:",text)
Here's a sample of the output:
Enter a request and press ENTER:
Where is Tahiti located
ChatGPT response: Tahiti is located in the South Pacific Ocean, specifically in French Polynesia. It is part of a group of islands called the Society Islands and is located approximately 4,000 kilometers (2,500 miles) south of Hawaii and 7,850 kilometers (4,880 miles) east of Australia.
Once you've generated a response from ChatGPT using the OpenAI package, the next step is to convert the text into speech using gTTS (Google Text-to-Speech) and play it back using IPython audio.
from gtts import gTTS
from IPython.display import Audio
tts = gTTS(text)
tts.save('1.wav')
sound_file = '1.wav'
Audio(sound_file, autoplay=True)
If your project requires the transcription of audio files, you can use OpenAI’s Whisper.
First, we’ll install the ffmpeg audio processing library. ffmpeg is a popular open-source software suite for handling multimedia data, including audio and video files:
!pip install ffmpeg
Next, we’ll install Whisper:
!pip install git+https://github.com/openai/whisper.git
With that done, we can use a simple command to transcribe the WAV file and store it as a JSON file with the same name:
!whisper 1.wav
You’ll see Whisper transcribe the file in chunks:
[00:00.000 --> 00:06.360] Tahiti is located in the South Pacific Ocean, specifically in the archipelago of society
[00:06.360 --> 00:09.800] islands and is part of French Polynesia.
[00:09.800 --> 00:22.360] It is approximately 4,000 miles, 6,400 km, south of Hawaii and 5,700 miles, 9,200 km,
[00:22.360 --> 00:24.640] west of Santiago, Chile.
Once that’s done, we can read the JSON file and display the text object:
import json with open('1.json') as f: data = json.load(f) text = data['text'] print(text)
This gives the following output:
Tahiti is located in the South Pacific Ocean, specifically in the archipelago of society islands and is part of French Polynesia. It is approximately 4,000 miles, 6,400 km, south of Hawaii and 5,700 miles, 9,200 km, west of Santiago, Chile.
By using Whisper in combination with ChatGPT and gTTS, you can create a fully featured AI-powered application that enables users to interact with your system using natural language inputs and receive audio responses. This might be useful for applications that involve transcribing meetings, conferences, or other audio files.
Denis Rothman graduated from Sorbonne University and Paris-Diderot University, designing one of the very first word2matrix patented embedding and patented AI conversational agents. He began his career authoring one of the first AI cognitive natural language processing (NLP) chatbots applied as an automated language teacher for Moet et Chandon and other companies. He authored an AI resource optimizer for IBM and apparel producers. He then authored an advanced planning and scheduling (APS) solution used worldwide.
You can follow Denis on LinkedIn: https://www.linkedin.com/in/denis-rothman-0b034043/
Copyright 2023 Denis Rothman, MIT License