Interpretable Fair Distance Learning for Categorical Data

Contributo in Atti di convegno

Data di Pubblicazione:

In Stampa

Abstract:

Categorical features are widespread in many decision support systems relying on personal and sensitive data, such as credit scoring or personalized medicine and are not exempt of bias and fairness concerns. Unfortunately, bias mitigation techniques based on representation learning for categorical data are poorly studied and most solutions are limited to using the same approaches designed for numeric data on one-hot encoded features. To fill this gap, we propose FairDILCA, a fair extension of a known framework for learning distances on categorical data, which exploits co-distributions of attributes values for computing distances. FairDILCA considers the correlation of the features w.r.t. the protected one to create an unbiased representation of the data, making any subsequent analysis and learning task fairer. Furthermore, it also represents a more interpretable option than typical representation learning approaches, since it relies on deterministic and clear computational steps. Thanks to extensive experiments, we show the effectiveness of our framework also when applied to a classification task and in comparison with a state-of-the-art method pursuing a similar objective.

Tipologia CRIS:

04A-Conference paper in volume

Keywords:

Categorical features, Distance learning, Fairness

Elenco autori:

A. Famiani, F. Peiretti, R.G. Pensa

Autori di Ateneo:

FAMIANI ALESSIO

PENSA Ruggero Gaetano

Link alla scheda completa:

https://iris.unito.it/handle/2318/2032190

Link al Full Text:

https://iris.unito.it/retrieve/handle/2318/2032190/1422719/bias2024_author.pdf

Titolo del libro:

Machine Learning and Principles and Practice of Knowledge Discovery in Databases - International Workshops of ECML PKDD 2024, Vilnius, Lithuania, September 9-13, 2024

Pubblicato in: