Skip to Main Content (Press Enter)

Logo UNITO
  • ×
  • Home
  • Pubblicazioni
  • Progetti
  • Persone
  • Competenze
  • Settori
  • Strutture
  • Terza Missione

UNI-FIND
Logo UNITO

|

UNI-FIND

unito.it
  • ×
  • Home
  • Pubblicazioni
  • Progetti
  • Persone
  • Competenze
  • Settori
  • Strutture
  • Terza Missione
  1. Pubblicazioni

Semantic processing for Urdu: corpus creation, parsing, and generation

Articolo
Data di Pubblicazione:
2025
Abstract:
Discourse representation structure (DRS), a formal meaning representation, has been used for both semantic parsing and natural language generation tasks and gained promising results for high-resource languages such as English and for the lesser-resourced European languages Italian, German, and Dutch. We investigate how we can employ DRS for the low-resource language Urdu for neural semantic parsing (translating Urdu sentences into formal meaning representations) and natural language generation (generating Urdu sentences from formal meaning representations). There are no annotated corpora for Urdu available, so we adopted a combined approach involving both manual annotations and rule-based procedures to transform English-aligned DRS into Urdu-aligned DRS through syntactic structure and word surface alignment, because word order in Urdu (subject–object–verb) differs from that of English (subject–verb–object). To further increase the amount of semantically annotated data, we developed lexical, grammatical, and named entity-based augmentation techniques. This resulted in an increase of nine times more data examples. Using the augmented meaning bank for Urdu, we developed a neural semantic parser and generator that benefited significantly from the augmented data and showed more generalization ability compared to the model without augmentation. We evaluated the effect of semantic data augmentation using a transformer-based state-of-the-art neural sequence-to-sequence architecture. Our implementation shows promising results for the semantic processing of Urdu and demonstrates that data augmentation increases performance (F1-Score) for semantic parsing from 67.12 to 76.81, and leads to substantially increased BLEU, BERT-Score, METEOR, ROUGE, and chrF scores for generation.
Tipologia CRIS:
03A-Articolo su Rivista
Keywords:
Urdu semantic parsing and generation, Urdu semantic representation, Semantic data augmentation, Parallel meaning bank, Urdu meaning bank
Elenco autori:
Amin, Muhammad Saad; Zhang, Xiao; Anselma, Luca; Mazzei, Alessandro; Bos, Johan
Autori di Ateneo:
ANSELMA Luca
MAZZEI Alessandro
Link alla scheda completa:
https://iris.unito.it/handle/2318/2064658
Link al Full Text:
https://iris.unito.it/retrieve/handle/2318/2064658/1616963/s10579-025-09819-2.pdf
Pubblicato in:
LANGUAGE RESOURCES AND EVALUATION
Journal
  • Dati Generali
  • Aree Di Ricerca

Dati Generali

URL

https://doi.org/10.1007/s10579-025-09819-2

Aree Di Ricerca

Settori (12)


PE6_7 - Artificial intelligence, intelligent systems, natural language processing - (2024)

CIBO, AGRICOLTURA e ALLEVAMENTI - Farmacologia Veterinaria

CULTURA, ARTE e CREATIVITA' - Culture moderne

INFORMATICA, AUTOMAZIONE e INTELLIGENZA ARTIFICIALE - Digitalizzazione della Cultura e della Creatività

INFORMATICA, AUTOMAZIONE e INTELLIGENZA ARTIFICIALE - Digitalizzazione della Società e della Pubblica Amministrazione

INFORMATICA, AUTOMAZIONE e INTELLIGENZA ARTIFICIALE - Salute e Informatica

LINGUE e LETTERATURA - Anglistica e angloamericanistica

LINGUE e LETTERATURA - Francesistica

PIANETA TERRA, AMBIENTE, CLIMA, ENERGIA e SOSTENIBILITA' - Diritto dell'Ambiente

PIANETA TERRA, AMBIENTE, CLIMA, ENERGIA e SOSTENIBILITA' - Informatica e Ambiente

SCIENZE MATEMATICHE, CHIMICHE, FISICHE - Fisica delle Particelle e dei Nuclei

SCIENZE MATEMATICHE, CHIMICHE, FISICHE - Laboratori innovativi, strumentazione e modellizzazione fisica
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 25.4.2.0