CAPIO: a Middleware for Transparent I/O Streaming in Data-Intensive Workflows
Contributo in Atti di convegno
Data di Pubblicazione:
2023
Abstract:
With the increasing amount of digital data available for analysis and simulation, the class of I/O-intensive HPC workflows is fated to quickly expand, further exacerbating the performance gap between computing, memory, and storage technologies. This paper introduces CAPIO (Cross-Application Programmable I/O), a middleware capable of injecting I/O streaming capabilities into file-based workflows, improving the
computation-I/O overlap without the need to change the application code. The contribution is twofold: 1) at design time, a new I/O coordination language allows users to annotate workflow data dependencies with synchronization semantics; 2) at run time, a user-space middleware automatically and transparently to the user turns a workflow batch execution into a streaming execution according to the semantics expressed in the configuration file. CAPIO has been tested on synthetic benchmarks simulating typical workflow I/O patterns and two real-world workflows. Experiments show that CAPIO reduces the execution time by 10% to 66% for data-intensive workflows that use the file system as a communication medium.
Tipologia CRIS:
04A-Conference paper in volume
Keywords:
Workflow, In situ model, I/O coordination
Elenco autori:
Alberto Riccardo Martinelli, Massimo Torquati, Marco Aldinucci, Iacopo Colonnelli, Barbara Cantalupo
Link alla scheda completa:
Link al Full Text:
Titolo del libro:
30th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC)
Pubblicato in: