Summarization and MT fine-tuning using simpletransformers
Up to now, you have learned the basics and advanced methods of training language models, but it is not always feasible to train your own language model from scratch because there are sometimes impediments such as low computational power. In this section, you will look at how to fine-tune language models on your own datasets for specific tasks of MT and summarization. Follow these next steps:
- To start, you need to install the
simpletransformers
library, as follows:pip install simpletransformers
- The next step is to download the dataset that contains your parallel corpus. This parallel corpus can be of any type of Seq2Seq task. For this example, we are going to use the MT example, but you can use any other dataset for other tasks such as paraphrasing, summarization, or even for converting text to Structured Query Language (SQL).
You can download the dataset from https://www.kaggle.com/seymasa/turkish-to-english-translation...