Technical requirements
In this chapter, we will work with the same BBC dataset that we worked with in Chapter 4. The dataset is located in the book GitHub repository:
It is also available through Hugging Face:
https://huggingface.co/datasets/SetFit/bbc-news
Note
This dataset is used in this book with permission from the researchers. The original paper associated with this dataset is as follows:
Derek Greene and Pádraig Cunningham. “Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering”, in Proc. 23rd International Conference on Machine Learning (ICML’06), 2006.
All rights, including copyright, in the text content of the original articles are owned by the BBC.
Please make sure to download all the Python notebooks in the util
folder on GitHub into the util
folder on your computer. The...