Skip to Main Content (Press Enter)

Logo UNITO
  • ×
  • Home
  • Pubblicazioni
  • Progetti
  • Persone
  • Competenze
  • Settori
  • Strutture
  • Terza Missione

UNI-FIND
Logo UNITO

|

UNI-FIND

unito.it
  • ×
  • Home
  • Pubblicazioni
  • Progetti
  • Persone
  • Competenze
  • Settori
  • Strutture
  • Terza Missione
  1. Pubblicazioni

DisaggregHate It Corpus: A Disaggregated Italian Dataset of Hate Speech

Contributo in Atti di convegno
Data di Pubblicazione:
2023
Abstract:
Recent studies in Machine Learning advocate for the exploitation of disagreement between annotators to train models in line with the different opinions of humans about a specific phenomenon. This means that datasets where the annotations are aggregated by majority voting are not enough. In this paper, we present an Italian disaggregated dataset concerning hate speech and encoding some information about the annotators: the DisaggregHate It Corpus. The corpus contains Italian tweets that focus on the topic of racism and has been annotated by native Italian university students. We explain how the dataset was gathered by following the recommendation of the perspectivist approach [1], encouraging the annotators to give some socio-demographic information about them. To exploit the disagreement in the learning process, we proposed two types of soft labels: softmax and standard normalization. We investigated the benefit of using disagreement by creating a baseline binary model and two regression models that were respectively trained on the ‘hard’ (aggregated label by majority voting) and the two types of ‘soft’ labels. We tested the models in an in-domain and out-of-domain setting, evaluating their performance using the cross-entropy as a metric, and showing that the models trained on the soft labels performed better.
Tipologia CRIS:
04A-Conference paper in volume
Keywords:
hate speech, perspectivism, disagreement
Elenco autori:
Marco Madeddu, Simona Frenda, Mirko Lai, Viviana Patti, Valerio Basile
Autori di Ateneo:
BASILE Valerio
PATTI Viviana
Link alla scheda completa:
https://iris.unito.it/handle/2318/1950454
Link al Full Text:
https://iris.unito.it/retrieve/handle/2318/1950454/1506097/paper29.pdf
Titolo del libro:
Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023)
Pubblicato in:
CEUR WORKSHOP PROCEEDINGS
Journal
CEUR WORKSHOP PROCEEDINGS
Series
  • Dati Generali
  • Aree Di Ricerca

Dati Generali

URL

https://ceur-ws.org/Vol-3596/paper29.pdf

Aree Di Ricerca

Settori (12)


PE6_7 - Artificial intelligence, intelligent systems, natural language processing - (2022)

CIBO, AGRICOLTURA e ALLEVAMENTI - Farmacologia Veterinaria

CULTURA, ARTE e CREATIVITA' - Culture moderne

INFORMATICA, AUTOMAZIONE e INTELLIGENZA ARTIFICIALE - Digitalizzazione della Cultura e della Creatività

INFORMATICA, AUTOMAZIONE e INTELLIGENZA ARTIFICIALE - Digitalizzazione della Società e della Pubblica Amministrazione

INFORMATICA, AUTOMAZIONE e INTELLIGENZA ARTIFICIALE - Salute e Informatica

LINGUE e LETTERATURA - Anglistica e angloamericanistica

LINGUE e LETTERATURA - Francesistica

PIANETA TERRA, AMBIENTE, CLIMA, ENERGIA e SOSTENIBILITA' - Diritto dell'Ambiente

PIANETA TERRA, AMBIENTE, CLIMA, ENERGIA e SOSTENIBILITA' - Informatica e Ambiente

SCIENZE MATEMATICHE, CHIMICHE, FISICHE - Fisica delle Particelle e dei Nuclei

SCIENZE MATEMATICHE, CHIMICHE, FISICHE - Laboratori innovativi, strumentazione e modellizzazione fisica
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 25.5.3.0