Option 2: Combining single tools into one agent
In this leg of our journey toward multimodality, we will leverage different tools as plug-ins to our STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION
agent. Our goal is to build a copilot agent that will help us generate reviews about YouTube videos, as well as post those reviews on our social media with a nice description and related picture. In all of that, we want to make little or no effort, so we need our agent to perform the following steps:
- Search and transcribe a YouTube video based on our input.
- Based on the transcription, generate a review with a length and style defined by the user query.
- Generate an image related to the video and the review.
We will call our copilot GPTuber. In the following subsections, we will examine each tool and then put them all together.
YouTube tools and Whisper
The first step of our agent will be to search and transcribe the YouTube video based on our input. To...