Assessing Large Language Models Inference Performance on a 64-core RISC-V CPU with Silicon-Enabled Vectors
Contributo in Atti di convegno
Data di Pubblicazione:
2024
Abstract:
The rising usage of compute-intensive AI applications with fast response time requirements, such as text generation using large language models, underscores the need for more efficient and versatile hardware solutions. This drives the exploration of emerging architectures like RISC-V, which has the potential to deliver strong performance within tight power constraints. The recent commercial release of processors with RISC-V Vector (RVV) silicon-enabled extensions further amplifies the significance of RISC-V architectures, offering enhanced capabilities for parallel processing and accelerating tasks critical to large language models and other AI applications. This work aims to evaluate the BERT and GPT-2 language models inference performance on the SOPHON SG2042 64-core RISC-V architecture with silicon-enabled RVV v0.7.1. We benchmarked the models with and without RVV, using OpenBLAS and BLIS as BLAS backends for PyTorch to enable vectorization. Enabling RVV in OpenBLAS improved the inference performance by up to 40% in some cases.
Tipologia CRIS:
04A-Conference paper in volume
Keywords:
RISC-V, RVV, PyTorch, LLM, XuanTie C920, SOPHON SG2042, OpenBLAS, BLIS
Elenco autori:
Adriano Marques Garcia, Giulio Malenza, Robert Birke, Marco Aldinucci
Link alla scheda completa:
Link al Full Text:
Titolo del libro:
BigHPC2024: Special Track on Big Data and High-Performance Computing