Using generative AI to display descriptive statistics
Generative AI tools provide data scientists with a great opportunity to streamline the data cleaning and exploration parts of our workflow. Large language models, in particular, have the potential to make this work much easier and more intuitive. Using these tools, we can select rows and columns by criteria, generate summary statistics, and plot variables.
A simple way to introduce generative AI tools into your data exploration is with PandasAI. PandasAI uses the OpenAI API to translate natural language queries into data selection and operations that pandas can understand. As of July 2023, OpenAI is the only large language model API that can be used with PandasAI, though the developers of the library anticipate adding other APIs.
We can use PandasAI to substantially reduce the lines of code we need to write to produce some of the tabulations and visualizations we have created so far in this chapter. The steps in this recipe...