Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Transformers for Natural Language Processing and Computer Vision

You're reading from   Transformers for Natural Language Processing and Computer Vision Explore Generative AI and Large Language Models with Hugging Face, ChatGPT, GPT-4V, and DALL-E 3

Arrow left icon
Product type Paperback
Published in Feb 2024
Publisher Packt
ISBN-13 9781805128724
Length 730 pages
Edition 3rd Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Denis Rothman Denis Rothman
Author Profile Icon Denis Rothman
Denis Rothman
Arrow right icon
View More author details
Toc

Table of Contents (24) Chapters Close

Preface 1. What Are Transformers? 2. Getting Started with the Architecture of the Transformer Model FREE CHAPTER 3. Emergent vs Downstream Tasks: The Unseen Depths of Transformers 4. Advancements in Translations with Google Trax, Google Translate, and Gemini 5. Diving into Fine-Tuning through BERT 6. Pretraining a Transformer from Scratch through RoBERTa 7. The Generative AI Revolution with ChatGPT 8. Fine-Tuning OpenAI GPT Models 9. Shattering the Black Box with Interpretable Tools 10. Investigating the Role of Tokenizers in Shaping Transformer Models 11. Leveraging LLM Embeddings as an Alternative to Fine-Tuning 12. Toward Syntax-Free Semantic Role Labeling with ChatGPT and GPT-4 13. Summarization with T5 and ChatGPT 14. Exploring Cutting-Edge LLMs with Vertex AI and PaLM 2 15. Guarding the Giants: Mitigating Risks in Large Language Models 16. Beyond Text: Vision Transformers in the Dawn of Revolutionary AI 17. Transcending the Image-Text Boundary with Stable Diffusion 18. Hugging Face AutoTrain: Training Vision Models without Coding 19. On the Road to Functional AGI with HuggingGPT and its Peers 20. Beyond Human-Designed Prompts with Generative Ideation 21. Index
Appendix A: Revolutionizing AI: The Power of Optimized Time Complexity in Transformer Models 1. Appendix B: Answers to the Questions

Part II: Automating prompt design for generative image design

For an AI developer, going beyond human-designed prompts with Generative AI is not very different from creating classical complex reports based on database queries:

  • The end user often doesn’t want to have anything to do with making the report.
  • The end user defines the report.
  • A developer creates SQL queries, for example.
  • The developer creates a click-and-run interface.
  • The end user works on the automatically generated report.

Why does this work? The key concept is a closed environment. In a small company or a small department of a large corporation, the tasks can be well defined, the data well identified, and the users willing to automate work that takes them hours to do and infringes on their free time.

In this section, we will:

  • Work in a closed environment.
  • Reproduce classical software automation for end users.
  • Build a use case for a marketing campaign for a small company that cannot afford to hire costly marketing personnel or consultants.
  • Stay as close as possible to non-AI methods that have been successful for decades in companies worldwide.

We will begin by examining how to initiate such a project.

ChatGPT/GPT-4 HTML presentation

We need to take a breath here and slow down. We are talking to someone who doesn’t have much time to listen to us in a small company or a small department in a large corporation. The person might be overworked, know nothing about AI, or have other preoccupations. Not all projects are cool online cloud applications that work in every situation! A considerable market will remain for the segments major cloud AI platforms cannot reach. And that is precisely where AI experts can thrive!

However, there are two fundamental rules for this type of project for the product to survive in production with somebody in the company doing first-level support:

  • We must learn how the company works as we build the project.
  • The key user(s) must learn more about our technology to communicate internally and provide first-level support.

We must build a communication strategy within the company to federate the team.

ChatGPT with GPT-4 provides the text for the presentation

The best project presentation will always remain the one made by the customer’s team. It may sound less specialized, but it will surely resonate as an authentic, integrated approach.

The main user must understand the basic concepts of the project. However, this person may have no idea how to present the project in a meeting to initiate the whole process.

That is when automated ideation enters the project!

Open Automated_Design.ipynb in the chapter’s directory of the GitHub repository.

The first automated ideation will consist of asking ChatGPT to sum up the process and create an HTML graph to illustrate the project. We write a classical program that reads the notebook in its downloaded Python version, saves it in a text file, and then sends the content to GPT-4 through an API with the proper instructions. Those instructions can become a hard-coded template for a promptless application on the user end.

We can explain that the whole summarization process will be automated once the prototype is approved. Bear in mind that many classical software applications already build very complex pipelines. They might not seem as “cool” as AI, but they are the backbone of many architectures.

In our use case, let’s suppose we are working with a person with no AI knowledge but who is very motivated to test and asks ChatGPT for assistance.

Source prompt for ChatGPT:

Provide an explanation of how Llama invents a prompt that Midjourney generates an Image with, and Microsoft Designer combines (text and image) to create a nice corporate flyer. Make the presentation engaging for a corporate meeting, please:

Note that a hard-coded prompt template library can be part of a knowledge base and triggered like any ordinary SQL script in a database.

In less than ten seconds, ChatGPT will provide the text and graph to illustrate the power of the functionality of the project with an automated presentation and a graph. What a time-saver! First, the presentation by ChatGPT for a meeting:

ChatGPT with GPT-4, provides the text for the presentation:

Step Automating Ideation and Design

Motivation: "Harnessing the Power of AI for Innovative Corporate Designs"

Functionality: "Llama -> Midjourney -> Microsoft Designer: A Seamless Integration"

Background: A sleek futuristic image representing AI and design integration, perhaps a neural network or a stylized digital flyer.

Step 1: Introducing LlaMA The Brain Behind Creative Prompting

Llama isn't just another AI. It's your digital brainstorming partner.

In the world of content, the right prompt is the seed to a great idea.

With its advanced algorithms, Llama invents engaging prompts tailored to your brand's voice and values.

Step 2: Enter Midjourney

Bringing Ideas to Visual Life

Ideas need visualization. That's where Midjourney comes in.

Imagine feeding Llama's prompt into Midjourney and witnessing the birth of a stunning, on-brand image.

It's like having an award-winning graphic designer on your team, working 24/7!

Step 3: Microsoft Designer - The Finishing Touch

Perfecting the Art

A great design is more than just text and image. It's about harmony.

Microsoft Designer takes the image from Midjourney, combines it with Llama's prompt, and crafts a corporate flyer that's not just beautiful, but effective.

The ease of integration and user-friendly interface ensures anyone can be a design pro!

Step 4: The Seamless Workflow

The Power Trio in Action

From thought to tangible design in three seamless steps!

Save time, reduce costs, and wow your audience every single time.

Step 5: Engaging Examples

Seeing is Believing!

Here's what we achieved with this integration.

Consistent, innovative, and always on-brand!

Step 6: Closing Thoughts & Q&A

Revolutionizing Corporate Design

In today's fast-paced digital age, staying innovative is the key.

With Llama, Midjourney, and Microsoft Designer, we're not just keeping up; we're leading the way!

Questions? Let's dive deeper!

Think of the time saved for the key user, who can build expert-level presentations in seconds! Now ChatGPT will create a graph in HTML to illustrate the presentation.

ChatGPT with GPT-4 provides a graph in HTML to illustrate the presentation

The following code and display were entirely written by ChatGPT, GPT-4, based on the previous dialog. Note that ChatGPT, being stochastic, may not provide the same response from one run to another.

Click on the cells. Then sit back and enjoy.

ChatGPT installs plotly to create graph objects:

!pip install plotly

ChatGPT imports the graph module:

import plotly.graph_objects as go

ChatGPT creates the graph that corresponds to its representation of the project:

# Define the nodes and edges
nodes = ["LLaMA", "Midjourney", "Microsoft Design", "ChatGPT-HTML"]
edges = [(0, 1), (1, 2), (2, 3)]
# Adjusted positions for shorter edges
x = [0, 0.6, 1.2, 1.8]
y = [0, 0.6, 0, -0.6]
# Create the arrow annotations
annotations = []
for edge in edges:
    annotations.append(
        go.layout.Annotation(
            showarrow=True,
            arrowsize=1,
            arrowhead=1,
            startarrowhead=0,
            arrowwidth=2,
            ax=x[edge[0]],
            ay=y[edge[0]],
            axref='x',
            ayref='y',
            x=x[edge[1]],
            y=y[edge[1]],
            xref='x',
            yref='y'
        )
    )
node_trace = go.Scatter(
    x=x, y=y,
    mode='markers+text',
    hoverinfo='text',
    textposition='top center',
    marker=dict(
        showscale=False,
        color='lightblue',  # Set the color to light-blue
        #colorscale='YlGnBu',
        size=100,
        colorbar=dict(
            thickness=15,
            title='Node Connections',
            xanchor='left',
            titleside='right'
        ),
        line=dict(width=2)))
node_trace.text = nodes
node_trace.hovertext = nodes
fig = go.Figure(data=[node_trace],
             layout=go.Layout(
                title='Cross-Platform, Model-Chained Automated Ideation Summary',
                titlefont_size=16,
                showlegend=False,
                hovermode='closest',
                margin=dict(b=0, l=0, r=0, t=40),
                xaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
                yaxis=dict(showgrid=False, zeroline=False, showticklabels=False),
                annotations=annotations)  # Add the arrow annotations
                )
fig.show()

The output is a nice graph that illustrates the chained-model process:

A diagram of a graph

Description automatically generated

Figure 20.2: Image of the chained-model process

This presentation is certainly not perfect. But it illustrates the tremendous paradigm shift taking place. Generative AI bots are becoming human-level agents and co-workers.

Microsoft 365 Copilot has begun to expand automated ideation in its products. PowerPoint will soon take automated ideation to unseen levels. Google Workspace has several proprietary copilots and many AI-driven add-ons. This is just the beginning, and as the systems improve, that will surely increasingly baffle us!

We will now use Llama 2 to automate prompt generation.

Llama 2

In this section, we will implement Llama 2 in a closed environment to automate prompt design with Generative AI. A closed environment is created by limiting the scope of a project to a specific domain within a department of an organization and gathering well-organized documents (emails, word processing files, spreadsheet files, media, or other sources). It then becomes feasible to build static AI queries just as for any classical software reports.

A brief introduction to Llama 2

Meta (formerly Facebook) released Llama 2 on July 18, 2023, when Touvron et al. published their paper (see the References section) Llama 2: Open Foundation and Fine-Tuned Chat Models. The same day, Hugging Face announced that they had worked with Meta and that access to Llama was available on their platform.

Llama 2 can be applied to translations, chats, question-answering, summarization, creative writing, and more.

At the time of the writing of this book, we were redirected to the Meta Llama website to gain access: https://ai.meta.com/resources/models-and-libraries/llama-downloads/.

Meta defines Llama 2 as follows:

Llama 2 means the foundational large language models and software and algorithms, including machine-learning model code, trained model weights, inference-enabling code, training-enabling code, fine-tuning enabling code, and other elements of the foregoing distributed by Meta.

Llama 2’s architecture can be summed up in one sentence in Section 2.2. Training Details of Meta’s paper:

We adopt most of the pretraining setting and model architecture from Llama 1. We use the standard transformer architecture (Vaswani et al., 2017). Source: Touvron et al. (2023).

Review Chapter 2, Getting Started with the Architecture of the Transformer Model, if necessary.

Like other transformer architectures, Meta implemented normalization, positional encoding, and other intermediate functions of the Original Transformer through an empirical trial-and-error process that produced good performances.

Section 2.2. Training Details of Meta’s paper points out the two main differences from Llama 1, which can be summed up as follows:

  • Context length was increased from 2048 to 4096 tokens, which is a key feature for transformer models so that the attention layers can attend to long-term dependencies in a text. In natural language, sometimes, the meaning of a sentence only becomes clear in longer contexts.
  • Grouped-query attention (GQA). Multi-Head Attention (MHA) divides the attention calculation into multiple heads. Each head attends to a different part of the sequence, and the results of the heads are then combined to produce a single attention weight. However, this can be expensive in terms of memory because the Keys (K) and Values (V) need to be stored in memory for each head.

One way to optimize memory is GQA, as chosen by Meta. GQA reduces the memory requirements for the KV cache by sharing the key and value vectors across multiple heads.

As the LLM market, and specifically the demand for Foundation Models, grows, we will see top research labs release new LLMs regularly. In each case, we simply need to focus on the core architecture and then run and test it for our needs.

Hugging Face did an excellent job by making Llama 2 available on their platform the day it was released by Meta. Let’s implement an example to get started.

Implementing Llama 2 with Hugging Face

It is recommended to go to the Hugging Face Meta Llama access page first: https://huggingface.co/meta-llama.

Make sure you choose the access that is best for your project.

This section continues to implement Automated_Design.ipynb in this chapter’s directory of the GitHub repository.

One method to automate prompts is to break it down into a Cloze template, which consists of filling the gaps (blanks, missing words) in a text.

The static part of the instruction becomes a Cloze template:

template='Create a prompt for a text to image model to generate ________________\n'

The dynamic part can be extracted from any type of text document with NLP: keywords, essential concepts, and any function required. This can be achieved with an LLM, as we have seen throughout the book, to obtain a target concept such as:

target=generate a ship on a highway in a desert\n

The target is then added to the template:

Seed LlaMA 2 prompt=templace + target

This section continues to implement Automated_Design.ipynb in this chapter’s directory of the GitHub repository.

Now, with this template and target seed prompt, we can run Llama 2 to obtain text-to-image prompts.

We first install transformers:

!pip install transformers -qq

We can log in to Hugging Face’s Hub directly in Google Colab:

!huggingface-cli login

Hugging Face will prompt you to enter your Hugging Face token:

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) n
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful

We now install accelerate to run the library faster:

!pip install accelerate -qq

Make sure to activate a GPU for this notebook. accelerate is a library that makes it easy to run PyTorch models on multiple GPUs, TPUs, and CPUs. It also supports mixed-precision processing and speeds up runtimes.

We import the modules we need:

from transformers import AutoTokenizer
import transformers
import torch

Now, we define Meta’s Llama 2 LLM as the model for text generation, which will be prompts in our case:

model = "meta-llaMA/LlaMA-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

The program now defines a function to make a request and return the response:

def LlaMA2(prompt):
  sequences = pipeline(
    prompt,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=200,
  )
  return sequences

The prompt will be the template + target prompt generated in the pipeline:

prompt='Create 5 prompts for a text to image model to generate a ship on a highway in a desert\n'
response=LLaMA2(prompt)
for seq in response:
    print(f"Output: {seq['generated_text']}")

The output is exactly what we need:

Output: Create 5 prompts for a text to image model to generate a ship on a highway in a desert
1. A massive cargo ship driving down a desert highway, with sand dunes in the background.
2. A small speedboat racing across a sandy beach on the highway, with a sunset in the sky.
3. A luxury cruise ship sailing through the desert, passing by a small oasis.
4. A historic pirate ship crashed on the highway, surrounded by cacti and sand.
5. A group of spacecraft traveling across a desolate desert highway, with stars in the background.

The program might produce different responses for each run because of LLMs’ stochastic nature, which explains their creativity. Llama 2 returns a json object. You can choose to let the process always be automatic and use the response to create a list of prompts in sequences:

# Run the function and capture the response
response = LLaMA2(prompt)
# Inspect the response object
print("Response type:", type(response))
print("Response content:", response)
# Assuming response is a list or similar iterable
if isinstance(response, list) and len(response) > 0:
    if 'generated_text' in response[0]:
        sequences = [{'generated_text': response[0]['generated_text']}]
    else:
        print("generated_text not in response[0]")
else:
    print("Response is not list-like or is empty")
# Extracting the 'generated_text' content
text_content = sequences[0]['generated_text']
# Splitting the text based on the newline character
lines = text_content.split('\n')
# Filtering out the lines that start with the number indicators and collecting the prompts
prompts = [line for line in lines if line.startswith(('1. ', '2. ', '3. ', '4. ', '5. '))]
# Printing the prompts
for prompt in prompts:
    print(prompt)

The output will be similar but may change between runs, as shown in this excerpt:

Response content: [{'generated_text': 'Create 5 prompts for a text to image model to generate a ship on a highway in a desert\n\n1. A sleek, futuristic ship cruises down a desert highway, its LED lights illuminating the dark landscape.\n2. A vintage muscle car is transformed into a high-tech spaceship, complete with rocket boosters and a gleaming silver finish.\n3. A massive, armored ship rumbles down the highway, its treads kicking up dust and debris as it speeds by.\n4. A small, agile ship darts through the desert landscape, its wings folded neatly against its body as it zooms past cacti and sand dunes.\n5. A colossal, alien spacecraft lands on the highway, its strange, glowing appendages stretching out in all directions as it disgorges a horde of curious, tentac'}]
1. A sleek, futuristic ship cruises down a desert highway, its LED lights illuminating the dark landscape.
2. A vintage muscle car is transformed into a high-tech spaceship, complete with rocket boosters and a gleaming silver finish.
3. A massive, armored ship rumbles down the highway, its treads kicking up dust and debris as it speeds by.
4. A small, agile ship darts through the desert landscape, its wings folded neatly against its body as it zooms past cacti and sand dunes.
5. A colossal, alien spacecraft lands on the highway, its strange, glowing appendages stretching out in all directions as it disgorges a horde of …

You may want to let the automation run uncontrolled up to this point but save the previous output. If you choose to write a function to save the output in a file and load it in sequences, your code can continue as follows:

# The given JSON output
sequences = [{'generated_text': 'Create 5 prompts for a image to text model to generate a ship on a highway in a desert\n\nHere are 5 prompts you could use to train an image to text model to generate a ship on a highway in a desert:\n\n1. "A long, sleek ship stands majestically on a vast desert highway, its gleaming hull contrasting with the barren sand and rocky outcroppings of the surrounding landscape."\n2. "In the distance, a solitary ship glides smoothly down the highway, kicking up small waves of sand and dust as it goes, its towering mast and billowing sails a striking against the clear blue sky."\n3. "A ship materializes seemingly out of thin air on the empty highway, its gleaming white hull and brightly colored sails a startling sight against the endless dunes of sand and the sun-baked rocks."\n4.'}]

The program now extracts the prompts Llama 2 generated:

# Extracting the 'generated_text' content
text_content = sequences[0]['generated_text']
# Splitting the text based on the newline character
lines = text_content.split('\n')
# Filtering out the lines that start with the number indicators and collecting the prompts
prompts = [line for line in lines if line.startswith(('1. ', '2. ', '3. ', '4. ', '5. '))]
# Printing the prompts
for prompt in prompts:
    print(prompt)

The output contains a clean list of the prompts:

1. "A long, sleek ship stands majestically on a vast desert highway, its gleaming hull contrasting with the barren sand and rocky outcroppings of the surrounding landscape."
2. "In the distance, a solitary ship glides smoothly down the highway, kicking up small waves of sand and dust as it goes, its towering mast and billowing sails a striking against the clear blue sky."
3. "A ship materializes seemingly out of thin air on the empty highway, its gleaming white hull and brightly colored sails a startling sight against the endless dunes of sand and the sun-baked rocks."
../…

Note: The prompts are generated with a stochastic algorithm and thus may vary from one run to another.

A specific prompt can be found with a random function or by simply selecting the first one:

# Choose a specific prompt by its index (for example, selecting the second prompt)
index = 1  # 0 for the first prompt, 1 for the second, and so on
if 0 <= index < len(prompts):
    print(prompts[index])
else:
    print("Invalid index.")

The program has now generated a prompt with no human intervention:

2. "In the distance, a solitary ship glides smoothly down the highway, kicking up small waves of sand and dust as it goes, its towering mast and billowing sails a striking against the clear blue sky."

This section illustrated that:

  • A pipeline LLM can automatically parse documents to generate prompts without human intervention. In this section, we imagined a function that would create the prompt for Llama 2.
  • Llama 2 can automate ideation by inventing prompts for a text-to-image process.

We will now apply the output of Llama 2 to the input of Midjourney.

Midjourney

In this section, we will create images with Midjourney using an automated prompt. The prompt was not directly written in Discord for API permission reasons. Also, you can simply read this section and visualize the process in the chapter notebook: Automated_Design.ipynb.

Or, you can reproduce the process in this chapter with the following recommendations (check Midjourney’s pricing policy before running image generations):

Once you have created your Midjourney and Discord accounts, you can produce an image by first entering /imagine, and prompt will appear automatically.

Then enter the prompt generated by Llama 2 that is in the Midjourney section of the notebook:

"In the distance, a solitary ship glides smoothly down the highway, kicking up small waves of sand and dust as it goes, its towering mast and billowing sails a striking against the clear blue sky."

The request should look like the screenshot in Figure 20.3:

A screenshot of a computer

Description automatically generated

Figure 20.3: Create an image generation request on Midjourney

You can save the image in the location of your choice. For this example, the image was saved on GitHub:

from IPython.display import Image     #This is used for rendering images in the notebook
import requests
url = 'https://raw.githubusercontent.com/Denis2054/Transformers_3rd_Edition/master/Chapter20/Ship_gliding.png'
response = requests.get(url, stream=True)
with open('Ship_gliding.png', 'wb') as f:
    for chunk in response.iter_content(chunk_size=1024):
        f.write(chunk)

The Midjourney image is displayed:

from PIL import Image
# Define the path of your image
image_path = "/content/Ship_gliding.png"
# Open the image
image = Image.open(image_path)
image

The output illustrates the prompt generated by Llama 2:

A ship on the beach

Description automatically generated

Figure 20.4: An image of a ship created by Midjourney

We have chained Llama 2 to Midjourney interactively. In Part III of this chapter, we will chain automated Llama 2 prompts to Stable Diffusion.

We can also use a Discord API to interact with our Midjourney images.

Discord API for Midjourney

In this section, we will implement a Discord API to retrieve the history of the images we created in Midjourney to build a gallery. Limit image retrievals to images on your personal server. Do not retrieve images you did not personally create.

Do not attempt to send image generation requests without Midjourney’s permission, and carefully follow Discord’s bot policy. Do not trust third-party services without first checking with Midjourney and Discord.

The prompts can be generated by Llama 2, as we just went through in the previous section, or other LLMs with the proper instructions:

"In the distance, a solitary ship glides smoothly down the highway, kicking up small waves of sand and dust as it goes, its towering mast and billowing sails a striking against the clear blue sky."

We can take the automated prompt generation further by using the complete prompt:

Complete_prompt= "In the distance, a solitary ship glides smoothly down the highway"

We can send the template with no target:

Template= "In the distance, a solitary ship glides____"

We can also send targets one after the other without the template:

"targets" =" in the Arizona desert," "in space between the Moon and Mars"

Before implementing a Discord API, carefully follow the steps in Discord’s documentation: https://discord.com/developers/docs/reference.

Open Midjourney_bot.ipynb in the chapter’s directory of the GitHub repository. The program first installs the Discord library and Pillow (a Python imaging library):

!pip install discord.py python-dotenv Pillow requests

You store your Discord API token in a file and retrieve it:

#Discord token
#Store you key in a file and read it(you can type it directly in the notebook but it will be visible for somebody next to you)
from google.colab import drive
drive.mount('/content/drive')
f = open("drive/MyDrive/files/midjourney.txt", "r")
discord_token=f.readline()
f.close()

You can also type it directly in the code (not recommended) when you import the modules:

import discord
from discord.ext import commands
import requests
from dotenv import load_dotenv
from PIL import Image
import os
#discord_token = [YOUR_TOKEN]

Now you can run the Python code cell that:

  • Connects to your Discord Midjourney server.
  • Listens to your activity.
  • Automatically downloads images you create during the session and splits the images.
  • Contains a download quantity variable to control the number of images to download.

The images are stored locally in a content/output directory.

Click on the cell to run it and also click to stop the process:

load_dotenv()
client = commands.Bot(command_prefix="*", intents=discord.Intents.all())
directory = os.getcwd()
print(directory)
def split_image(image_file):
    with Image.open(image_file) as im:
        # Get the width and height of the original image
        width, height = im.size
        # Calculate the middle points along the horizontal and vertical axes
        mid_x = width // 2
        mid_y = height // 2
        # Split the image into four equal parts
        top_left = im.crop((0, 0, mid_x, mid_y))
        top_right = im.crop((mid_x, 0, width, mid_y))
        bottom_left = im.crop((0, mid_y, mid_x, height))
        bottom_right = im.crop((mid_x, mid_y, width, height))
        return top_left, top_right, bottom_left, bottom_right
…

For more information, see Discord API tutorial for Midjourney in the References section of this chapter.

The following cell in the notebook displays the downloaded Midjourney images:

from Ipython.display import Image, display
import os
directory = '/content/output/'
for filename in os.listdir(directory):
    if filename.endswith(".jpg"):
        display(Image(filename=os.path.join(directory, filename)))

The output shows your creations:

A person standing in a desert with a planet in the background

Description automatically generated

Figure 20.5: Image created by Midjourney

You can zip the downloaded images for future use:

import zipfile
from Ipython.display import FileLink
# Zip the images
zip_name = '/content/images.zip'
with zipfile.ZipFile(zip_name, 'w') as zipf:
    for filename in os.listdir(directory):
        if filename.endswith(".jpg"):
            zipf.write(os.path.join(directory, filename), filename)  # Second arg is the arcname, to store the file name without any directory

You can save the ZIP file in the location of your choice. In this example, the ZIP file was saved on GitHub.

In another session, we don’t have to rerun the Discord API to view our creations.

We can download the ZIP from our storage location:

!curl -L https://raw.githubusercontent.com/Denis2054/Transformers_3rd_Edition/master/Chapter20/images.zip --output "images.zip"

Then, we can write an excellent HTML function that reads the files and displays them in a nice HMTL gallery:

from IPython.core.display import display, HTML
import os
import base64
# Path to the folder containing the unzipped images
image_folder = '/content/images/'
# List the files in the directory. Ensure they are image files (by checking the extension, for instance)
image_files = [f for f in os.listdir(image_folder) if os.path.isfile(os.path.join(image_folder, f)) and f.endswith('.jpg')]
html_string = '<div style="display: flex; flex-wrap: wrap;">'
for image_file in image_files:
    img_path = os.path.join(image_folder, image_file)
    
    # Convert image to base64
    with open(img_path, "rb") as image_file_obj:
        encoded_string = base64.b64encode(image_file_obj.read()).decode()
        
    img_base64 = "data:image/jpeg;base64," + encoded_string
    
    html_string += f'<div style="margin: 10px;"><img src="{img_base64}" width="150" alt="{image_file}" title="{image_file}"></div>'
html_string += '</div>'
display(HTML(html_string))

The gallery will display the images in the following way:

A collage of a road

Description automatically generated

Figure 20.6: Image gallery

Chaining Llama 2 to Midjourney was an exciting experience.

Now, we will create a business flyer with Microsoft Designer.

Microsoft Designer

In this section, Microsoft Designer will perform an image-to-image task with an output of Midjourney. Microsoft Designer can generate images and videos: https://designer.microsoft.com/.

To sum up the process of this section up to now, we:

  • Asked Llama 2 to generate prompts.
  • Chained the output of Llama 2 to ask Midjourney to process the prompts provided by Llama 2. The output contained images.

We will now feed the output of Midjourney to Microsoft Designer to create a flyer, as shown in Figure 20.7:

A screenshot of a computer

Description automatically generated

Figure 20.7: Creating a flyer with Microsoft Designer

The output was saved and uploaded to GitHub. You can view the flyer in the Microsoft Designer section of Automated_Design.ipynb, in the chapter directory of the GitHub repository.

The notebook downloads the flyer created by Microsoft Designer:

import requests
url = 'https://raw.githubusercontent.com/Denis2054/Transformers_3rd_Edition/master/Chapter20/Designer.png'
headers = {
    'Authorization': 'token ' + github_token
}
response = requests.get(url, headers=headers, stream=True)
with open('Designer.png', 'wb') as f:
    for chunk in response.iter_content(chunk_size=1024):
        f.write(chunk)

Then, the program displays the image-to-image generation:

from PIL import Image
# Define the path of your image
image_path = "/content/Designer.png"
# Open the image
image = Image.open(image_path)
image

The output is a well-designed flyer:

A poster of a ship on a beach

Description automatically generated

Figure 20.8: A flyer about teamwork

Microsoft Designer also generated an image-to-video that was saved and uploaded to GitHub. Let’s download it:

import subprocess
subprocess.run(['pip', 'install', 'moviepy', '-qq'], check=True)
import requests
url = 'https://raw.githubusercontent.com/Denis2054/Transformers_3rd_Edition/master/Chapter20/Designer.mp4'
response = requests.get(url, stream=True)
with open('Designer.mp4', 'wb') as f:
    for chunk in response.iter_content(chunk_size=1024):
        f.write(chunk)

The notebook now displays the video:

from moviepy.editor import *
# Load myHolidays.mp4 and select the subclip 00:00:00 - 00:00:60
clip = VideoFileClip("Designer.mp4").subclip(00,4)
clip = clip.loop(5)
clip.ipython_display(width=900,height=500)

Clicking the play button will start the video, which will run five times in a loop:

A black and white gradient

Description automatically generated

Figure 20.9: The video created by Microsoft Designer

We chained an automated prompt design with Llama 2 to Midjourney that created images. We then submitted one of Midjourney’s images to Microsoft Designer for an image-to-image and image-to-video task, as shown in Figure 20.10:

A diagram of a process flow

Description automatically generated

Figure 20.10: Proof of concept of automated ideation with Llama 2, Midjourney, and Microsoft Designer

Our proof of concept was successful. We demonstrated that prompts and ideation could be automated.

To fully complete the demonstration, we need to automate F1 and F2 and replace F3:

  • F1 is the output of the automated prompt design we simulated in this section. In the next section, we will automate the process.
  • F2 represents the automated prompts generated by Llama 2 that we copied and pasted into the code. In the following section, we will automate this function as well.
  • F3 represents the image-to-image process from Midjourney to Microsoft Designer in this section. In the next section, we will replace these components with Stable Diffusion to produce an image without human intervention.

This section proved that fully automated ideation was possible. We will further automate the process.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime