Practice project: Transfer learning for the finance domain
This project aims to fine-tune BLOOM on a curated corpus of specific documents to imbue it with the ability to interpret and articulate concepts specific to Proxima and its products.
Our methodology is inspired by strategies for domain adaptation across various fields, including biomedicine, finance, and law. A noteworthy study conducted by Cheng et al. in 2023 called Adapting Large Language Models via Reading Comprehension presents a novel approach for enhancing LLMs’ proficiency in domain-specific tasks. This approach repurposed extensive pre-training corpora into formats conducive to reading comprehension tasks, significantly improving the models’ functionality in specialized domains. In our case, we will apply a similar but simplified approach to continued pre-training by fine-tuning the pre-trained BLOOM model using a bespoke dataset specific to Proxima, effectively continuing the model’s training...