Skip to Main Content (Press Enter)

Logo UNITO
  • ×
  • Home
  • Pubblicazioni
  • Progetti
  • Persone
  • Competenze
  • Settori
  • Strutture
  • Terza Missione

UNI-FIND
Logo UNITO

|

UNI-FIND

unito.it
  • ×
  • Home
  • Pubblicazioni
  • Progetti
  • Persone
  • Competenze
  • Settori
  • Strutture
  • Terza Missione
  1. Pubblicazioni

AutoPenBench: Benchmarking Generative Agents for Penetration Testing

Altro Prodotto di Ricerca
Data di Pubblicazione:
2024
Abstract:
Generative AI agents, software systems powered by Large Language Models (LLMs), are emerging as a promising approach to automate cybersecurity tasks. Among the others, penetration testing is a challenging field due to the task complexity and the diverse set of strategies to simulate cyberattacks. Despite growing interest and initial studies in automating penetration testing with generative agents, there remains a significant gap in the form of a comprehensive and standard framework for their evaluation, comparison and development. This paper introduces AUTOPENBENCH, an open benchmark for evaluating generative agents in automated penetration testing. We address the challenges of existing approaches by presenting a comprehensive framework that includes 33 tasks, each representing a vulnerable system that the agent has to attack. Tasks are of increasing difficulty levels and include in-vitro and real-world scenarios. To assess the agent performance we define generic and specific milestones that allow anyone to compare results in a standardised manner and understand the limits of the agent under test. We show the benefits of our methodology by benchmarking two modular agent cognitive architectures: a fully autonomous and a semi-autonomous agent supporting human interaction. Our benchmark lets us compare their performance and limitations. For instance, the fully autonomous agent performs unsatisfactorily achieving a 21% Success Rate across the benchmark, solving 27% of the simple tasks and only one real-world task. In contrast, the assisted agent demonstrates substantial improvements, attaining 64% of success rate. AUTOPENBENCH allows us also to observe how different LLMs like GPT-4o, Gemini Flash or OpenAI o1 impact the ability of the agents to complete the tasks. We believe that our benchmark fills the gap by offering a standard and flexible framework to compare penetration testing agents on a common ground. We hope to extend AUTOPENBENCH along with the research community by making it available under https://github.com/lucagioacchini/auto-pen-bench.
Tipologia CRIS:
07T-Pre-Print
Keywords:
Generative agents, Large Language Models, Penetration testing, Cybersecurity
Elenco autori:
Luca Gioacchini, Marco Mellia, Idilio Drago, Alexander Delsanto, Giuseppe Siracusano, Roberto Bifulco
Autori di Ateneo:
DRAGO Idilio
Link alla scheda completa:
https://iris.unito.it/handle/2318/2019630
Link al Full Text:
https://iris.unito.it/retrieve/handle/2318/2019630/1390213/2024_arxiv_autopenbench.pdf
Progetto:
Q-CPS2 - Missione 4 - Componente 2- Investimento 1.3, finanziato dall'Unione europea - NextGenerationEU - Bando SERICS - Codice: PE00000014 - CUP: J33C22002810001
  • Dati Generali
  • Aree Di Ricerca

Dati Generali

URL

https://arxiv.org/abs/2410.03225

Aree Di Ricerca

Settori (4)


PE6_5 - Security, privacy, cryptology, quantum cryptography - (2024)

CIBO, AGRICOLTURA e ALLEVAMENTI - Farmacologia Veterinaria

INFORMATICA, AUTOMAZIONE e INTELLIGENZA ARTIFICIALE - Digitalizzazione della Società e della Pubblica Amministrazione

INFORMATICA, AUTOMAZIONE e INTELLIGENZA ARTIFICIALE - Industria X.0
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 25.6.1.0