Data di Pubblicazione:
2023
Abstract:
The Software Heritage (SWH) dataset serves as a vast repository for open-source code, with the ambitious goal of preserving all publicly available open-source projects. Despite being designed to effectively archive project files, its size of nearly 1 petabyte presents challenges in efficiently supporting Big Data MapReduce or AI systems. To address this disparity and enable seamless custom analytics on the SWH dataset, we present the SWH-Analytics (SWHA) architecture. This development environment quickly and transparently runs custom analytic applications on open-source software data preserved over time by SWH.
Tipologia CRIS:
04D-Meeting abstract in volume
Keywords:
Software Heritage, Open-source Software, Large-scale analytics, License management
Elenco autori:
Alessia Antelmi, Massimo Torquati, Daniele Gregori, Francesco Polzella, Gianmarco Spinatelli, Marco Aldinucci
Link alla scheda completa:
Link al Full Text:
Titolo del libro:
The 2nd Italian Conference on Big Data and Data Science (ITADATA 2023)