1. Preparing the dataset for fine-tuning
Fine-tuning an OpenAI model requires careful preparation; otherwise, the fine-tuning job will fail. In this section, we will carry out the following steps:
- Download the dataset from Hugging Face and prepare it by processing its columns.
- Stream the dataset to a JSON file in JSONL format.
The program begins by downloading the dataset.
1.1. Downloading and visualizing the dataset
We will download the SciQ dataset we embedded in Chapter 8. As we saw, embedding thousands of documents takes time and resources. In this section, we will download the dataset, but this time, we will not embed it. We will let the OpenAI model handle that for us while fine-tuning the data.
The program downloads the same Hugging Face dataset as in Chapter 8 and filters the training portion of the dataset to include only non-empty records with the correct answer and support text to explain the answer to the questions:
# Import required...