Data Augmentation through Back-Translation for Stereotypes and Irony Detection

Contributo in Atti di convegno

Data di Pubblicazione:

2024

Abstract:

Complex linguistic phenomena such as stereotypes or irony are still challenging to detect, particularly due to the lower availability of annotated data. In this paper, we explore Back-Translation (BT) as a data augmentation method to enhance such datasets by artificially introducing semantics-preserving variations. We investigate French and Italian as source languages on two multilingual datasets annotated for the presence of stereotypes or irony and evaluate French/Italian, English, and Arabic as pivot languages for the BT process. We also investigate cross-translation, i.e., augmenting one language subset of a multilingual dataset with translated instances from the other languages. We conduct an intrinsic evaluation of the quality of back-translated instances, identifying linguistic or translation model-specific errors that may occur with BT. We also perform an extrinsic evaluation of different data augmentation configurations to train a multilingual Transformer-based classifier for stereotype or irony detection on mono-lingual data.

Tipologia CRIS:

04A-Conference paper in volume

Keywords:

Back Translation; Data Augmentation; Irony Detection; Low-Resource NLP; Stereotypes Detection

Elenco autori: