AlBERTo: Italian BERT language understanding model for NLP challenging tasks based on tweets

Contributo in Atti di convegno

Data di Pubblicazione:

2019

Abstract:

Recent scientific studies on natural language processing (NLP) report the outstanding effectiveness observed in the use of context-dependent and task-free language understanding models such as ELMo, GPT, and BERT. Specifically, they have proved to achieve state of the art performance in numerous complex NLP tasks such as question answering and sentiment analysis in the English language. Following the great popularity and effectiveness that these models are gaining in the scientific community, we trained a BERT language understanding model for the Italian language (AlBERTo). In particular, AlBERTo is focused on the language used in social networks, specifically on Twitter. To demonstrate its robustness, we evaluated AlBERTo on the EVALITA 2016 task SENTIPOLC (SENTIment POLarity Classification) obtaining state of the art results in subjectivity, polarity and irony detection on Italian tweets. The pre-trained AlBERTo model will be publicly distributed through the GitHub platform at the following web address: https://github.com/marcopoli/AlBERTo-it in order to facilitate future research.

Tipologia CRIS:

04A-Conference paper in volume

Elenco autori:

Polignano M.; Basile P.; de Gemmis M.; Semeraro G.; Basile V.

Autori di Ateneo:

BASILE Valerio

Link alla scheda completa:

https://iris.unito.it/handle/2318/1759767

Link al Full Text:

https://iris.unito.it/retrieve/handle/2318/1759767/671683/paper57.pdf

Titolo del libro:

Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019)

Pubblicato in:

CEUR WORKSHOP PROCEEDINGS

Journal

CEUR WORKSHOP PROCEEDINGS

Series