Data di Pubblicazione:
2025
Abstract:
Co-clustering is a useful tool that extracts summary information from a data matrix in terms of row and column clusters, and gives a succinct representation of the data. However, if the matrix contains data about individuals, such representations could leak their privacy-sensitive information. In terms of privacy disclosure, co-clustering is even more harmful than clustering, because of the additional information carried by the column partition. However, to the best of our knowledge, the problem of privacy-preserving co-clustering has never been studied. To fill this gap, we consider a recent co-clustering algorithm, based on a de-normalized version of the Goodman-Kruskal’s τ association measure, which has a good property from a differential privacy perspective, and is supposed not to consume an excessive amount of privacy budget. This leads to a privacy-preserving co-clustering algorithm that satisfies the definition of differential privacy while providing good partitioning solutions. Our algorithm is based on a prototype-based optimization strategy that makes it fast and actionable in realistic privacy-preserving data management and analysis scenarios, as shown by our extensive experimental validation.
Tipologia CRIS:
04A-Conference paper in volume
Keywords:
clustering, privacy, unsupervised learning, high-dimensional data
Elenco autori:
Battaglia, Elena; Pensa, Ruggero G.
Link alla scheda completa:
Titolo del libro:
Proceedings of the 2025 SIAM International Conference on Data Mining (SDM)