Build a Project that Automates your Code Review

Developers understand the importance of a solid code review process but find it time-consuming. Language Models (LLMs) offer a solution by providing insights into code flows and enforcing best practices. A project aims to automate code reviews using LLMs, revolutionizing developers' approach. An intelligent assistant will swiftly analyze code differences, generating feedback in seconds. Imagine having an AI-powered reviewer guiding you to write cleaner, more efficient code.

The focus is to streamline the code review workflow, empowering developers to produce high-quality code while saving time. The system will offer comprehensive insights through automated analysis, highlighting areas that need attention and suggesting improvements. By embracing LLMs' potential and automation, this project aims to make code reviews seamless and rewarding. Join the journey to explore LLMs' impact on code review and enhance the development experience.

Project Overview

In this article, we are developing a Python program that will harness the power of OpenAI's ChatGPT for code review. This program will read diff changes from the standard input and generate comprehensive code review comments. The generated comments will be compiled into an HTML file, which will include AI-generated feedback for each diff file section, presented alongside the diff sections themselves as code blocks with syntax highlighting. To simplify the review process, the program will automatically open the HTML file in the user's default web browser.

build-a-project-that-automates-your-code-review-img-0

Image 1: Project Page

Build your Project

Let's walk through the steps to build this code review program from scratch. By following these steps, you'll be able to create your own implementation tailored to your specific needs. Let's get started:

1. Set Up Your Development Environment Ensure you have Python installed on your machine. You can download and install the latest version of Python from the official Python website.

2. Install Required Libraries To interact with OpenAI's ChatGPT and handle diff changes, you'll need to install the necessary Python libraries. Use pip, the package installer for Python, to install the required dependencies. You can install packages by running the following command in your terminal:

pip install openai numpy

3. To implement the next steps, create a new Python file named `chatgpt_code_reviewer.py`.

4. Import the necessary modules:

import argparse
import os
import random
import string
import sys
import webbrowser

import openai
from tqdm import tqdm

5. Set up the OpenAI API key (you'll need to get a key at https://openai.com if you don't have one yet):

openai.api_key = os.environ["OPENAI_KEY"]

6. Define a function to format code snippets within the code review comments, ensuring they are easily distinguishable and readable within the generated HTML report.

def add_code_tags(text):
    # Find all the occurrences of text surrounded by backticks
    import re

    matches = re.finditer(r"`(.+?)`", text)

    # Create a list to store the updated text chunks
    updated_chunks = []
    last_end = 0
    for match in matches:
       # Add the text before the current match
       updated_chunks.append(text[last_end : match.start()])

       # Add the matched text surrounded by <code> tags
       updated_chunks.append("<b>`{}`</b>".format(match.group(1)))

       # Update the last_end variable to the end of the current match
       last_end = match.end()

    # Add the remaining text after the last match
    updated_chunks.append(text[last_end:])

    # Join the updated chunks and return the resulting HTML string
    return "".join(updated_chunks)

7. Define a function to generate a comment using ChatGPT:

def generate_comment(diff, chatbot_context):
    # Use the OpenAI ChatGPT to generate a comment on the file changes

    chatbot_context.append(
       {
             "role": "user",
             "content": f"Make a code review of the changes made in this diff: {diff}",
       }
    )
    # Retry up to three times
    retries = 3
    for attempt in range(retries):
       try:
             response = openai.ChatCompletion.create(
                   model="gpt-3.5-turbo",
                   messages=chatbot_context,
                   n=1,
                   stop=None,
                   temperature=0.3,
             )

       except Exception as e:
             if attempt == retries - 1:
                   print(f"attempt: {attempt}, retries: {retries}")
                   raise e  # Raise the error if reached maximum retries
             else:
                   print("OpenAI error occurred. Retrying...")
                   continue

    comment = response.choices[0].message.content

    # Update the chatbot context with the latest response
    chatbot_context = [
       {
             "role": "user",
                   "content": f"Make a code review of the changes made in this diff: {diff}",
       },
       {
             "role": "assistant",
             "content": comment,
       }
    ]

    return comment, chatbot_context

The `generate_comment` function defined above uses the OpenAI ChatGPT to generate a code review comment based on the provided `diff` and the existing `chatbot_context`. It appends the user's request to review the changes in the `chatbot_context`.

The function retries the API call up to three times to handle any potential errors. It makes use of the `openai.ChatCompletion.create()` method and provides the appropriate model, messages, and other parameters to generate a response. The generated comment is extracted from the response, and the chatbot context is updated to include the latest user request and assistant response. Finally, the function returns the comment and the updated chatbot context.

This function will be a crucial part of the code review program, as it uses ChatGPT to generate insightful comments on the provided code diffs.

8. Define a function to create the HTML output:

def create_html_output(title, description, changes, prompt):
    random_string = "".join(random.choices(string.ascii_letters, k=5))
    output_file_name = random_string + "-output.html"

    title_text = f"\nTitle: {title}" if title else ""
    description_text = f"\nDescription: {description}" if description else ""
    chatbot_context = [
       {
             "role": "user",
             "content": f"{prompt}{title_text}{description_text}",
       }
    ]

    # Generate the HTML output
    html_output = "<html>\n<head>\n<style>\n"
    html_output += "body {\n    font-family: Roboto, Ubuntu, Cantarell, Helvetica Neue, sans-serif;\n    margin: 0;\n    padding: 0;\n}\n"
    html_output += "pre {\n    white-space: pre-wrap;\n    background-color: #f6f8fa;\n    border-radius: 3px;\n    font-size: 85%;\n    line-height: 1.45;\n    overflow: auto;\n    padding: 16px;\n}\n"
    html_output += "</style>\n"
    html_output += '<link rel="stylesheet"\n href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/styles/default.min.css">\n <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/highlight.min.js"></script>\n'
    html_output += "<script>hljs.highlightAll();</script>\n"
    html_output += "</head>\n<body>\n"
    html_output += "<div style='background-color: #333; color: #fff; padding: 20px;'>"
    html_output += "<h1 style='margin: 0;'>AI code review</h1>"
    html_output += f"<h3>Diff to review: {title}</h3>" if title else ""
    html_output += "</div>"

    # Generate comments for each diff with a progress bar
    with tqdm(total=len(changes), desc="Making code review", unit="diff") as pbar:
       for i, change in enumerate(changes):
             diff = change["diff"]
             comment, chatbot_context = generate_comment(diff, chatbot_context)
             pbar.update(1)
             # Write the diff and comment to the HTML
             html_output += f"<h3>Diff</h3>\n<pre><code>{diff}</code></pre>\n"
             html_output += f"<h3>Comment</h3>\n<pre>{add_code_tags(comment)}</pre>\n"
    html_output += "</body>\n</html>"

    # Write the HTML output to a file
    with open(output_file_name, "w") as f:
       f.write(html_output)

    return output_file_name

The `create_html_output` function defined above takes the `title`, `description`, `changes`, and `prompt` as inputs. It creates an HTML output file that contains the code review comments for each diff, along with the corresponding diff sections as code blocks with syntax highlighting. Let's explain it in more detail:

First, the function initializes a random string to be used in the output file name. It creates the appropriate title and description text based on the provided inputs and sets up the initial `chatbot_context`. Next, the function generates the HTML structure and styling, including the necessary CSS and JavaScript libraries for syntax highlighting. It also includes a header section for the AI code review.

Using a progress bar, the function iterates over each change in the changes list. For each change, it retrieves the `diff` and generates a comment using the `generate_comment` function. The progress bar is updated accordingly. The function then writes the `diff` and the corresponding comment to the HTML output. The `diff` is displayed within a `<pre><code>` block for better formatting, and the comment is wrapped in `<pre>` tags. The `add_code_tags` function is used to add code tags to the comment, highlighting any code snippets. After processing all the changes, the function completes the HTML structure by closing the `<body>` and `<html>` tags.

Finally, the HTML output is written to a file with a randomly generated name. The file name is returned by the function as the output. This `create_html_output` function makes the final HTML output that presents the code review comments alongside the corresponding diff sections.

9. Define a function to get diff changes from the pipeline:

def get_diff_changes_from_pipeline():
    # Get the piped input
    piped_input = sys.stdin.read()
    # Split the input into a list of diff sections
    diffs = piped_input.split("diff --git")
    # Create a list of dictionaries, where each dictionary contains a single diff section
    diff_list = [{"diff": diff} for diff in diffs if diff]
    return diff_list

The `get_diff_changes_from_pipeline` function defined above retrieves the input from the pipeline, which is typically the output of a command like `git diff`. It reads the piped input using the`sys.stdin.read()`. The input is then split based on the "diff --git" string, which is commonly used to separate individual diff sections. This splits the input into a list of diff sections.

By dividing the diff into separate sections, this function enables code reviews of very large projects. It overcomes the context limitation that LLMs have by processing each diff section independently. This approach allows for efficient and scalable code reviews, ensuring that the review process can handle projects of any size. The function returns the list of diff sections as the output, which can be further processed and utilized in the code review pipeline

10. Define the main function:

def main():
    title, description, prompt = None, None, None
    changes = get_diff_changes_from_pipeline()
    # Parse command line arguments
    parser = argparse.ArgumentParser(description="AI code review script")
    parser.add_argument("--title", type=str, help="Title of the diff")
    parser.add_argument("--description", type=str, help="Description of the diff")
    parser.add_argument("--prompt", type=str, help="Custom prompt for the AI")
    args = parser.parse_args()
    title = args.title if args.title else title
    description = args.description if args.description else description
    prompt = args.prompt if args.prompt else PROMPT_TEMPLATE
    output_file = create_html_output(title, description, changes, prompt)
    try:
       webbrowser.open(output_file)
    except Exception:
       print(f"Error running the web browser, you can try to open the outputfile: {output_file} manually")

if __name__ == "__main__":
    main()

The `main` function serves as the entry point of the code review script. It begins by initializing the `title`, `description`, and `prompt` variables as `None`.

Next, it calls the `get_diff_changes_from_pipeline` function to retrieve the diff changes from the pipeline. These changes will be used for the code review process.

The script then parses the command line arguments using the `argparse` module. It allows specifying optional arguments such as `--title`, `--description`, and `--prompt` to customize the code review process. The values provided through the command line are assigned to the corresponding variables, overriding the default `None` values. After parsing the arguments, the `create_html_output` function is called to generate the HTML output file. The `title`, `description`, `changes`, and `prompt` is passed as arguments to the function. The output file name is returned and stored in the `output_file` variable.

Finally, the script attempts to open the generated HTML file in the default web browser using the `webbrowser` module. If an error occurs during the process, a message is printed, suggesting manually opening the output file.

11. Save the file and run it using the following command:

 git diff master..branch | python3 chatgpt_code_reviewer.py

A progress bar will be displayed and after a while, the browser will open an html file with the output of the command, depending on the number of files to review it may take a few seconds or a few minutes, in this video, you can see the process.

Congratulations! You have now created your own ChatGPT code reviewer project from scratch. Remember to adapt and customize the prompt^[1] based on your specific requirements and preferences.

You can find the complete code on this ChatGPTCodeReviewer GitHub repository.

Happy coding!

Article Reference

^[1] The `prompt` is an essential component of the code review process using ChatGPT. It is typically a text that sets the context and provides instructions to the AI model about what is expected from its response like what it needs to review, specific questions, or guidelines to focus the AI's attention on particular aspects of the code.

In the code, a default `PROMPT_TEMPLATE` is used if no custom prompt is provided. You can modify the `PROMPT_TEMPLATE` variable or pass your prompt using the `--prompt` argument to tailor the AI's behavior according to your specific requirements.

By carefully crafting the prompt, you can help steer the AI's responses in a way that aligns with your code review expectations, ensuring the generated comments are relevant, constructive, and aligned with the desired code quality standards.

Author Bio

Luis Sobrecueva is a software engineer with many years of experience working with a wide range of different technologies in various operating systems, databases, and frameworks. He began his professional career developing software as a research fellow in the engineering projects area at the University of Oviedo. He continued in a private company developing low-level (C / C ++) database engines and visual development environments to later jump into the world of web development where he met Python and discovered his passion for Machine Learning, applying it to various large-scale projects, such as creating and deploying a recommender for a job board with several million users. It was also at that time when he began to contribute to open source deep learning projects and to participate in machine learning competitions and when he took several ML courses obtaining various certifications highlighting a MicroMasters Program in Statistics and Data Science at MIT and a Udacity Deep Learning nanodegree. He currently works as a Data Engineer at a ride-hailing company called Cabify, but continues to develop his career as an ML engineer by consulting and contributing to open-source projects such as OpenAI and Autokeras.

Author of the book: Automated Machine Learning with AutoKeras