Data di Pubblicazione:
2025
Abstract:
Emerging hardware constraints are pushing workloads to become more composite. This transition involves new jobs where the HPC I/O systems are shared among multiple and concurrent jobs. This can generate load imbalances and contention in the end-to-end I/O paths, thus degrading the I/O system performance and the workloads. Recognizing this context, we define a simulation-based framework that alleviates resource contention in applications and ultimately allows us to design contention avoidance strategies. Specifically, by capturing behavior system-wide and extracting phases and characteristics of various performance metrics, we can mitigate contention by delaying the launch of applications. This framework leverages frequency domain analysis of performance metrics alongside clustering methods and is coupled with a comprehensive model of an HPC system implemented using Extended Stochastic Symmetric Nets.
Tipologia CRIS:
04A-Conference paper in volume
Keywords:
HPC; I/O; Markov Process; Monitoring; Performance Modeling
Elenco autori:
Pernice, Simone; Tarraf, Ahmad; Besnard, Jean-Baptiste; Cantalupo, Barbara; Cascajo, Alberto; Singh, David E.; Wolf, Felix; Carretero, Jesús; Shende, Sameer; Aldinucci, Marco
Link alla scheda completa:
Titolo del libro:
Proceedings - 2025 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2025