Have you ever wished you could just listen to the summary of a long YouTube video instead of watching the whole thing? Well, you're in luck! In this article, I’ll be showcasing a fun little Python project that I’ve been working on, which allows you to do just that.
Don’t get me wrong: YouTube is a great resource for learning about new technologies and keeping you up to date with the latest news. And best of all: it’s free. But sometimes, I tend to lose track of time in the myriad of videos out there, fast forwarding through long talks only to find out in the end that the information I’m looking for is not in the video ☹
Well, if you often find yourself in a similar situation, here’s a potential tool you might like. This little script downloads the audio from a YouTube video, transcribes it, summarizes it using AI and finally generates a new audio file with the summary. And all this magic is done using the OpenAI GPT-3.5-turbo API and some cool AWS services (S3, Transcribe, and Polly). In less than 80 lines of code.
For those who might be unfamiliar with these APIs, here is their purpose in the script:
Before you start using these services, be aware that both AWS and OpenAI have usage quotas and costs associated with them. Make sure to familiarize yourself with these to avoid any unexpected charges. You’ll probably fall well within the limits of your Amazon account’s free tier unless you start summarizing hundreds of videos.
Also, you might consider adding error handling in the code. To keep it short I’ve skipped it from this demo.
You can download the Python file for this code from GitHub here.
Make sure you store your OpenAI API Key and AWS Credentials in your local environment variables for secure and efficient access. The code works on the assumption that both the OpenAI API keys and AWS credentials are valid and have been already stored on your local environment variables. Alternatively, you can store your AWS ACCESS KEY
and SECRET ACCESS KEY
in %USERPROFILE%\.aws\credentials
More info on that here: https://docs.aws.amazon.com/sdkref/latest/guide/creds-config-files.html
For the code to function properly make sure the AWS credentials you are using have the following permissions:
The most convenient and safe approach to grant the necessary permissions is though the AWS Management Console by attaching the relevant policies to the user or role associated with the credentials.
I’ve used Python v3.11. Make sure you first install all the requirements or update them to the latest version if already installed.
pip install pytube
pip install openai
pip install boto3
pip install requests
pip install python-dotenv
Let’s break it down snippet by snippet.
import os
import boto3
import requests
import openai
import uuid
from pytube import YouTube
The download_audio
function uses the pytube
library to download the audio from a YouTube video. The audio file is saved locally before being uploaded to S3 by the main function. Here’s a complete documentation for pytube: https://pytube.io/en/latest/
def download_audio(video_id):
yt = YouTube(f'https://www.youtube.com/watch?v={video_id}')
return yt.streams.get_audio_only().download(filename=video_id)
The transcribe_audio
function uses AWS Transcribe to convert the audio into text. The UUID (Universally Unique Identifier) module is used to generate a unique identifier for each transcription job. The benefit of using UUIDs here is that every time we run the function, a new unique job name is created. This is important because AWS Transcribe requires job names to be unique. Here’s the complete documentation of AWS Transcribe: https://docs.aws.amazon.com/transcribe/latest/dg/what-is.html
def transcribe_audio(s3, bucket, file_name):
transcribe = boto3.client('transcribe')
job_name = f"TranscriptionJob-{uuid.uuid4()}"
transcribe.start_transcription_job(
TranscriptionJobName=job_name,
Media={'MediaFileUri': f"s3://{bucket}/{file_name}"},
MediaFormat='mp4',
LanguageCode='en-US'
)
while True:
status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
break
return status['TranscriptionJob']['Transcript']['TranscriptFileUri'] if status['TranscriptionJob']['TranscriptionJobStatus'] == 'COMPLETED' else None
The summarize_transcript
function leverages OpenAI's GPT-3.5-turbo to summarize the transcript. Notice the simple prompt I’ve used for this task. I’ve tried to keep it very short in order to save more tokens for the actual transcript. It can definitely be improved and tweaked according to your preferences. For a complete documentation of the OpenAI API check out this link: https://platform.openai.com/docs/api-reference/introduction
def summarize_transcript(transcript):
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a knowledge curator helping users to understand the contents of video transcripts."},
{"role": "user", "content": f"Please summarize the following transcript: '{transcript}'"}
]
)
return response['choices'][0]['message']['content'].strip()
The synthesize_speech
function uses AWS Polly to convert the summarized text back into audio. If you prefer other voices or want to tweak different parameters such as speed, language, or dialect, here’s the complete documentation on how to use Polly: https://docs.aws.amazon.com/polly/index.html
def synthesize_speech(s3, bucket, transcript_uri):
transcript_data = requests.get(transcript_uri).json()
transcript = ' '.join(item['alternatives'][0]['content'] for item in transcript_data['results']['items'] if item['type'] == 'pronunciation')
summary = summarize_transcript(transcript)
summary_file_name = f"summary_{uuid.uuid4()}.txt"
s3.put_object(Body=summary, Bucket=bucket, Key=summary_file_name)
polly = boto3.client('polly')
response = polly.synthesize_speech(OutputFormat='mp3', Text=summary, VoiceId='Matthew', Engine='neural')
mp3_file_name = f"speech_{uuid.uuid4()}.mp3"
with open(mp3_file_name, 'wb') as f:
f.write(response['AudioStream'].read())
return mp3_file_name
To keep our storage in check and avoid littering the cloud, it’s best to clean up all objects from the bucket. We’ll be able to delete the bucket completely once the audio summary has been downloaded locally.
Remember, we only needed the S3 bucket because it was required by AWS Transcribe and Polly.
def delete_all_objects(bucket_name):
s3 = boto3.resource('s3')
bucket = s3.Bucket(bucket_name)
bucket.objects.all().delete()
And finally, the main
function, which ties everything together. It specifies the YouTube video to summarize (which you can obviously change to any another video ID), sets up the necessary AWS services and calls the functions defined above in the correct order. It also makes sure to clean up by deleting the S3 bucket after use.
def main():
video_id = 'U3PiD-g7XJM' #change to any other Video ID from YouTube
bucket = f"bucket-{uuid.uuid4()}"
file_name = f"{video_id}.mp4"
openai.api_key = os.getenv('OPENAI_API_KEY')
s3 = boto3.client('s3')
s3.create_bucket(Bucket=bucket)
print ("Downloading audio stream from youtube video...")
audio_file = download_audio(video_id)
print ("Uploading video to S3 bucket...")
s3.upload_file(audio_file, bucket, file_name)
print("Transcribing audio...")
transcript_uri = transcribe_audio(s3, bucket, file_name)
print("Synthesizing speech...")
mp3_file_name = synthesize_speech(s3, bucket, transcript_uri)
print(f"Audio summary saved in: {mp3_file_name}\n")
delete_all_objects(bucket)
s3.delete_bucket(Bucket=bucket)
if __name__ == "__main__":
main()
And that's it! With this simple tool you can now convert any YouTube video into a summarized audio file.
So, sit back, relax and let AI do the work for you.
Enjoy!
Andrei Gheorghiu is an experienced trainer with a passion for helping learners achieve their maximum potential. He always strives to bring a high level of expertise and empathy to his teaching.
With a background in IT audit, information security, and IT service management, Andrei has delivered training to over 10,000 students across different industries and countries. He is also a Certified Information Systems Security Professional and Certified Information Systems Auditor, with a keen interest in digital domains like Security Management and Artificial Intelligence.
In his free time, Andrei enjoys trail running, photography, video editing and exploring the latest developments in technology.
You can connect with Andrei on: