




















































Hi ,
Welcome to a brand new issue of PythonPro!
In today’sExpert Insight we bring you an excerpt from the recently published book, Generative AI Foundations in Python, which provides a hands-on guide to implementing generative AI models—GANs, diffusion models, and transformers—using PyTorch and the diffusers library.
News Highlights:Theuv Python packaging tool now offers comprehensive project management, tool installation, and support for single-file scripts; and Tach, written in Rust, enforces strict interfaces and dependency management for Python
Here are my top 5 picks from our learning resources today:
And, in today’sFeatured Study, we introduce PyRoboCOP, a Python-based package designed for optimizing robotic control and collision avoidance in complex environments.
Stay awesome!
Divya Anne Selvaraj
Editor-in-Chief
P.S.:We have covered all requests made so far this month, in this issue.
In “PyRoboCOP: Python-based Robotic Control & Optimization Package for Manipulation and Collision Avoidance” Raghunathan et al. introduce a Python-based software package designed for the optimisation and control of robotic systems. The package excels in handling complex interactions like contact and collision avoidance, crucial for autonomous robotic manipulation.
Robotic systems often operate in environments with numerous obstacles and objects, making it essential to model and optimise these interactions mathematically. These interactions, defined by complementarity constraints, are challenging to manage because they do not follow standard optimisation assumptions. Most existing physics engines simulate these interactions but do not offer real-time optimisation capabilities.PyRoboCOPaddresses this gap by providing a flexible and user-friendly package that allows robots to reason about their environment and optimise their behaviour, which is critical for achieving autonomous manipulation tasks.
PyRoboCOP is characterised by its ability to automatically reformulate complex mathematical constraints and integrate seamlessly with powerful optimisation tools. Key features include:
The package is particularly relevant for researchers, developers, and engineers working in the field of robotics, especially those involved in designing autonomous systems that require precise control and optimisation. PyRoboCOP’s ability to handle complex robotic interactions makes it a valuable tool for developing real-time, model-based control solutions in environments where contact and collision avoidance are critical.
PyRoboCOP's performance was rigorously tested across several robotic scenarios, including planar pushing, car parking, and belt drive unit assembly. In a planar pushing task, PyRoboCOP optimised the robot's trajectory, balancing a normal force of 0.5 N and a friction coefficient of 0.3, successfully navigating from (0,0,0)(0,0,0)(0,0,0) to (0.5,0.5,0)(0.5,0.5,0)(0.5,0.5,0) and (−0.1,−0.1,3π/2)(−0.1,−0.1,3π/2)(−0.1,−0.1,3π/2). In a car parking scenario, the software optimised movement from (1,4,0,0)(1,4,0,0)(1,4,0,0) to (2,2.5,π/2,0)(2,2.5,π/2,0)(2,2.5,π/2,0), effectively avoiding obstacles. PyRoboCOP also managed the complex task of assembling a belt drive unit, demonstrating its ability to handle intricate manipulations. When benchmarked againstCasADiandPyomo, PyRoboCOP showed comparable performance, solving an acrobot system in a mean time of 2.282 seconds with 1,296 variables, versus CasADi's 1.175 seconds with 900 variables and Pyomo's 2.374 seconds with 909 variables.
You can learn more by reading the entirepaperor access the packagehere.
Here’s an excerpt from “Chapter 2: Surveying GenAI Types and Modes: An Overview of GANs, Diffusers, and Transformers” in the book,Generative AI Foundations in PythonbyCarlos Rodriguez, published in July 2024.
Applying GAI models – image generation using GANs, diffusers, and transformers
In this hands-on section…You’ll get a first-hand experience and deep dive into theactual implementation of generative models, specifically GANs, diffusion models, and transformers….I'm a new paragraph block.
We’ll be utilizing the highly versatilePyTorchlibrary, a popular choice among machine learning practitioners, to facilitate our operations.PyTorchprovides a powerful and dynamic toolset to define and compute gradients, which is central to trainingthese models.
In addition, we’ll also use thediffuserslibrary. It’s a specialized library that provides functionality to implement diffusion models. This library enables us to reproduce state-of-the-art diffusion models directly from our workspace. It underpins the creation, training, and usage of denoising diffusion probabilistic models at an unprecedented level of simplicity, without compromising themodels’ complexity.
Through this practical session, we’ll explore how to operate and integrate these libraries and implement and manipulate GANs, diffusers, and transformers using the Python programming language. This hands-on experience will complement the theoretical knowledge we have gained in the chapter, enabling us to see these models in action in thereal world….
Jupyter notebooks enable live code execution, visualization, and explanatory text, suitable for prototyping and data analysis. Google Colab, conversely, is a cloud-based version of Jupyter Notebook, designed for machine learning prototyping. It provides free GPU resources and integrates with Google Drive for file storage and sharing. We’ll leverage Colab as our prototyping environmentgoing forward.
We begin with a pre-trained stable diffusion model, a text-to-image latent diffusion model created by researchers and engineers from CompVis, Stability AI, and LAION (Patil et al., 2022). The diffusion process is used to draw samples from complex, high-dimensional distributions, and when it interacts with the text embeddings, it creates a powerful conditional imagesynthesis model.
The term “stable” in this context refers to the fact that during training, a model maintains certain properties that stabilize the learning process. Stable diffusion models offer rich potential to create entirely new samples from a given data distribution, based ontext prompts.
Again, for our practical example, we will Google Colab to alleviate a lot of initial setups. Colab also provides all of the computational resources needed to begin experimenting right away. We start by installing some libraries, and with three simple functions, we will build out a minimalStableDiffusionPipelineusing a well-established open-source implementation of the stablediffusion method.
First, let’s navigate to our pre-configured Python environment, Google Colab, and install thediffusersopen-source library, which will provide most of the key underlying components we need forour experiment.
In the first cell, we install all dependencies using the followingbashcommand. Note the exclamation point at the beginning of the line, which tells our environment to reach down to its underlying process and install the packageswe need:
!pip install pytorch-fid torch diffusers clip transformers accelerate
Next, we import the libraries we’ve just installed to make them available to ourPython program:
from typing import List
import torch
import matplotlib.pyplot as plt
from diffusers import StableDiffusionPipeline, DDPMScheduler
Now, we’re ready for our three functions, which will execute the three tasks – loading the pre-trained model, generating the images based on prompting, and renderingthe images:
def load_model(model_id: str) -> StableDiffusionPipeline:
"""Load model with provided model_id."""
return StableDiffusionPipeline.from_pretrained(
model_id,
torch_dtype=torch.float16,
revision="fp16",
use_auth_token=False
).to("cuda")
def generate_images(
pipe: StableDiffusionPipeline,
prompts: List[str]
) -> torch.Tensor:
"""Generate images based on provided prompts."""
with torch.autocast("cuda"):
images = pipe(prompts).images
return images
def render_images(images: torch.Tensor):
"""Plot the generated images."""
plt.figure(figsize=(10, 5))
for i, img in enumerate(images):
plt.subplot(1, 2, i + 1)
plt.imshow(img)
plt.axis("off")
plt.show()
In summary,load_modelloads a machine learning model identified bymodel_idonto a GPU for faster processing. Thegenerate_imagesfunction takes this model and a list of prompts to create our images. Within this function, you will notice torch.autocast("cuda"), which is a special command that allows PyTorch (our underlying machine learning library) to perform operations faster while maintaining accuracy. Lastly, the render_images function displays these images in a simple grid format, making use of the matplotlib visualization library to renderour output.
With our functions defined, we select our model version, define our pipeline, and execute our imagegeneration process:
# Execution
model_id = "CompVis/stable-diffusion-v1-4"
prompts = [
"A hyper-realistic photo of a friendly lion",
"A stylized oil painting of a NYC Brownstone"
]
pipe = load_model(model_id)
images = generate_images(pipe, prompts)
render_images(images)
The output inFigure 2.1is a vivid example of the imaginativeness and creativity we typically expect from human art, generated entirely by the diffusion process. Except, how do we measure whether the model was faithful to thetext provided?
Figure 2.1: Output for the prompts “A hyper-realistic photo of a friendly lion” (left) and “A stylized oil painting of a NYC Brownstone” (right)
The next step is to evaluate the quality and relevance of our generated images in relation to the prompts. This is where CLIP comes into play. CLIP is designed to measure the alignment between text and images by analyzing their semantic similarities, giving us a true quantitative measure of the fidelity of our synthetic images tothe prompts.
CLIP is trained to understand the relationship between text and images by learning to place similar images and text near each other in a shared space. When evaluating a generated image, CLIP checks how closely the image aligns with the textual description provided. A higher score indicates a better match, meaning the image accurately represents the text. Conversely, a lower score suggests a deviation from the text, indicating a lesser quality or fidelity to the prompt, providing a quantitative measure of how well the generated image adheres to theintended description.
Again, we will import thenecessary libraries:
from typing import List, Tuple
from PIL import Image
import requests
from transformers import CLIPProcessor, CLIPModel
import torch
We begin by loading the CLIP model, processor, andnecessary parameters:
# Constants
CLIP_REPO = "openai/clip-vit-base-patch32"
def load_model_and_processor(
model_name: str
) -> Tuple[CLIPModel, CLIPProcessor]:
"""
Loads the CLIP model and processor.
"""
model = CLIPModel.from_pretrained(model_name)
processor = CLIPProcessor.from_pretrained(model_name)
return model, processor
Next, we define a processing function to adjust the textual prompts and images, ensuring that they are in the correct format forCLIP inference:
def process_inputs(
processor: CLIPProcessor, prompts: List[str],
images: List[Image.Image]) -> dict:
"""
Processes the inputs using the CLIP processor.
"""
return processor(text=prompts, images=images,
return_tensors="pt", padding=True)
In this step, we initiate the evaluation process by inputting the images and textual prompts into the CLIP model. This is done in parallel across multiple devices to optimize performance. The model then computes similarity scores, known as logits, for each image-text pair. These scores indicate how well each image corresponds to the text prompts. To interpret these scores more intuitively, we convert them into probabilities, which indicate the likelihood that an image aligns with any of thegiven prompts:
def get_probabilities(
model: CLIPModel, inputs: dict) -> torch.Tensor:
"""
Computes the probabilities using the CLIP model.
"""
outputs = model(**inputs)
logits = outputs.logits_per_image
# Define temperature - higher temperature will make the distribution more uniform.
T = 10
# Apply temperature to the logits
temp_adjusted_logits = logits / T
probs = torch.nn.functional.softmax(
temp_adjusted_logits, dim=1)
return probs
Lastly, we display the images along with their scores, visually representing how well each image adheres to theprovided prompts:
def display_images_with_scores(
images: List[Image.Image], scores: torch.Tensor) -> None:
"""
Displays the images alongside their scores.
"""
# Set print options for readability
torch.set_printoptions(precision=2, sci_mode=False)
for i, image in enumerate(images):
print(f"Image {i + 1}:")
display(image)
print(f"Scores: {scores[i, :]}")
print()
With everything detailed, let’s execute the pipelineas follows:
# Load CLIP model
model, processor = load_model_and_processor(CLIP_REPO)
# Process image and text inputs together
inputs = process_inputs(processor, prompts, images)
# Extract the probabilities
probs = get_probabilities(model, inputs)
# Display each image with corresponding scores
display_images_with_scores(images, probs)
We now have scores for each of our synthetic images that quantify the fidelity of the synthetic image to the text provided, based on the CLIP model, which interprets both image and text data as one combined mathematical representation (or geometric space) and can measuretheir similarity.
Figure 2.2: CLIP scores
For our “friendly lion,” we computed scores of 83% and 17% for each prompt, which we can interpret as an 83% likelihood that the image aligns with thefirst prompt.
Packt library subscribers cancontinue readingthe entire book for free. You can buyGenerative AI Foundations in Pythonby Carlos Rodriguez,here.