Word augmenting
Word augmentations carry the same bias and safe level warning as character augmentations. Over half of these augmentation methods inject errors into the text, but other functions generate new text using synonyms or a pretrained AI model. The standard word augmentation functions are listed as follows:
- The Misspell augmentation function uses a predefined dictionary to simulate spelling mistakes. It is based on the scholarly paper Text Data Augmentation Made Simple By Leveraging NLP Cloud APIs by Claude Coulombe, which was published in 2018.
- The Split augmentation function splits words into two tokens randomly.
- The Random word augmentation method applies random behavior to the text with four parameters: substitute, swap, delete, and crop. It is based on two scholarly papers: Synthetic and Natural Noise Both Break Neural Machine Translation by Yonatan Belinkov and Yonatan Bisk, published in 2018, and Data Augmentation via Dependency Tree Morphing for Low...