Data Augmentation through Back-Translation for Stereotypes and Irony Detection
Contributo in Atti di convegno
Data di Pubblicazione:
2024
Abstract:
Complex linguistic phenomena such as stereotypes or irony are still challenging to detect, particularly due to the lower availability of annotated data. In this paper, we explore Back-Translation (BT) as a data augmentation method to enhance such datasets by artificially introducing semantics-preserving variations. We investigate French and Italian as source languages on two multilingual datasets annotated for the presence of stereotypes or irony and evaluate French/Italian, English, and Arabic as pivot languages for the BT process. We also investigate cross-translation, i.e., augmenting one language subset of a multilingual dataset with translated instances from the other languages. We conduct an intrinsic evaluation of the quality of back-translated instances, identifying linguistic or translation model-specific errors that may occur with BT. We also perform an extrinsic evaluation of different data augmentation configurations to train a multilingual Transformer-based classifier for stereotype or irony detection on mono-lingual data.
Tipologia CRIS:
04A-Conference paper in volume
Keywords:
Back Translation; Data Augmentation; Irony Detection; Low-Resource NLP; Stereotypes Detection
Elenco autori:
Bourgeade T.; Casola S.; Wizani A.M.; Bosco C.
Link alla scheda completa:
Titolo del libro:
CEUR Workshop Proceedings
Pubblicato in: