Writing unit tests for R functions with ChatGPT

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!

This article is an excerpt from the book, R Bioinformatics Cookbook - Second Edition, by Dan MacLean. Discover over 80 recipes for modeling and handling real-life biological data using modern libraries from the R ecosystem.

Introduction

ChatGPT is beneficial for generating R unit tests because of its knowledge of programming patterns and best practices. It can understand code snippets, including functions and their expected behaviors, and generate relevant unit tests. By leveraging its training on diverse programming contexts, ChatGPT can assist in creating comprehensive test cases that cover different scenarios and edge cases. It helps ensure code correctness, identify potential bugs, and improve code reliability. ChatGPT’s ability to generate unit tests saves developers time and effort, promotes software quality, and contributes to building robust and well-tested R code. In this recipe, we’ll learn how to get ChatGPT to do the tedious work of generating test cases for code that we are working on.

Getting ready

We just need a ChatGPT account: https://chat.openai.com/.

How to do it…

To write unit tests for an R function with ChatGPT, do the following:

1. Paste the following request into ChatGPT:

write unit tests for the following R code

factorise_cols <- function(df, col_list ){
 not_all_na <- function(x) any(!is.na(x))

 for ( i in col_list){
   df <- df %>% dplyr::mutate(!!(i) := as.factor(!!(i)) )
 }
 df %>% dplyr::select_if(not_all_na)
}

2. Read and check the output:

library(dplyr)
library(testthat)

# Create a test data frame
test_df <- data.frame(
 col1 = c("A", "B", "C", NA),
 col2 = c(1, 2, NA, 4),
 col3 = c("X", "Y", "Z", NA)
)

# Define the expected output for the test data frame
expected_df <- data.frame(
 col1 = factor(c("A", "B", "C", NA)),
 col3 = factor(c("X", "Y", "Z", NA))
)

# Start defining the unit tests
test_that("factorise_cols correctly factorises selected 
columns", {
 # Call the factorise_cols function
 result_df <- factorise_cols(test_df, c("col1", "col3"))

 # Check if the selected columns are factorised
 expect_equal(result_df$col1, expected_df$col1)
 expect_equal(result_df$col3, expected_df$col3)

 # Check if other columns are not modified
 expect_equal(result_df$col2, test_df$col2)
})

test_that("factorise_cols returns a data frame with non-NA 
columns", {
 # Call the factorise_cols function
 result_df <- factorise_cols(test_df, c("col1", "col3"))

 # Check if the returned data frame only contains non-NA 
columns
 expect_true(all(!is.na(result_df$col1)))
 expect_true(all(!is.na(result_df$col3)))
 expect_true(all(is.na(result_df$col2)))
})

And with that, we should have an output that can test some of the facets of our code at least and a good start to a more dependable and workable code base.

How it works…

The recipe here takes advantage of ChatGPT’s representations of the structure of the R programming language in order to write code that will effectively test some example code. In step 1, we simply define the function we wish to test and ask for tests.

In step 2, we see the output that ChatGPT generated in this instance. It has given us a pretty good set of unit tests. As with everything to do with ChatGPT, there isn’t a guarantee that they are correct, but we can read and verify them very easily – certainly in much less time than it would take to write them. One thing to note is that, in this case at least, ChatGPT hasn’t generated tests for the case with only NA in a column, which we may decide we need. It is true that this isn’t clear in the initial code, so generating the test has given us a new thought on the safe running of this function.

Conclusion

In conclusion, leveraging ChatGPT for unit testing R functions offers a transformative approach. Its adept understanding of programming nuances simplifies the arduous task of generating comprehensive tests, fostering code reliability and quality assurance. By effortlessly crafting diverse test cases, ChatGPT significantly reduces developers' workload, ensuring code correctness, identifying potential bugs, and fortifying the codebase against edge cases. While it doesn't guarantee absolute correctness, its output provides a solid foundation for enhancing code robustness. Embracing ChatGPT's capabilities not only saves time and effort but also contributes profoundly to building more dependable and well-tested R code, elevating the development process to new levels of efficiency and reliability.

Author Bio

Professor Dan MacLean has a Ph.D. in molecular biology from the University of Cambridge and gained postdoctoral experience in genomics and bioinformatics at Stanford University in California. Dan is now Head of Bioinformatics at the world-leading Sainsbury Laboratory in Norwich, UK where he works on bioinformatics, genomics, and machine learning. He teaches undergraduates, post-graduates, and post-doctoral students in data science and computational biology. His research group has developed numerous new methods and software in R, Python, and other languages with over 100,000 downloads combined.