Multi-Agent LLM System for Automated Vulnerability Discovery and Reproduction
Recorded: May 27, 2026, 6 p.m.
| Original | Summarized |
[2605.21779] FuzzingBrain V2: A Multi-Agent LLM System for Automated Vulnerability Discovery and Reproduction
Skip to main content Learn about arXiv becoming an independent nonprofit. We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. > cs > arXiv:2605.21779 Help | Advanced Search All fields Search GO quick links Login Computer Science > Cryptography and Security arXiv:2605.21779 (cs) [Submitted on 20 May 2026] Abstract:Software vulnerabilities pose critical security threats, with nearly 50,000 CVEs reported in 2025. While Large Language Models (LLMs) show promise for automated vulnerability detection, three key challenges remain. First, LLM-generated vulnerability reports suffer from high false positive rates and lack Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE) Cite as: Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Ze Sheng [view email] [v1]
Full-text links: View a PDF of the paper titled FuzzingBrain V2: A Multi-Agent LLM System for Automated Vulnerability Discovery and Reproduction, by Ze Sheng and 4 other authorsView PDFHTML (experimental)TeX Source view license < prev | new Change to browse by: References & Citations NASA ADSGoogle Scholar export BibTeX citation BibTeX formatted citation loading... Data provided by: Bookmark
Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Code, Data and Media Associated with this Article alphaXiv Toggle alphaXiv (What is alphaXiv?) Links to Code Toggle CatalyzeX Code Finder for Papers (What is CatalyzeX?) DagsHub Toggle DagsHub (What is DagsHub?) GotitPub Toggle Gotit.pub (What is GotitPub?) Huggingface Toggle Hugging Face (What is Huggingface?) ScienceCast Toggle ScienceCast (What is ScienceCast?) Demos Demos Replicate Toggle Replicate (What is Replicate?) Spaces Toggle Hugging Face Spaces (What is Spaces?) Spaces Toggle TXYZ.AI (What is TXYZ.AI?) Related Papers Recommenders and Search Tools Link to Influence Flower Influence Flower (What are Influence Flowers?) Core recommender toggle CORE Recommender (What is CORE?) Author About arXivLabs arXivLabs: experimental projects with community collaborators Which authors of this paper are endorsers? | About contact arXivClick here to contact arXiv subscribe to arXiv mailingsClick here to subscribe Copyright Web Accessibility Assistance arXiv Operational Status |
Software vulnerabilities represent a critical security threat, underscored by the reported nearly 50,000 CVEs in 2025. While Large Language Models (LLMs) exhibit potential for automated vulnerability detection, existing methods face several significant challenges. These challenges include the issue of high false positive rates in LLM-generated vulnerability reports and the lack of reproducible verification for these findings. Furthermore, current LLM-based approaches utilize suboptimal granularity for vulnerability localization; function-level analysis fails to capture bugs when context is broad, while line-level analysis lacks necessary surrounding context. A third major difficulty lies in the system's inability to effectively reason about vulnerabilities that involve complex cross-function dependencies and triggering conditions. To address these limitations, the authors introduce FuzzingBrain V2, a multi-agent system designed to overcome these gaps through four specific contributions. First, the system achieves fully automated vulnerability analysis by integrating with Google's OSS-Fuzz, ensuring that all reported vulnerabilities are reproducible by the fuzzer. Second, the system introduces Suspicious Point, a novel control-flow-based abstraction aimed at enabling precise vulnerability localization at the optimal granularity. Third, it employs logic-driven hierarchical function analysis combined with dual-layer fuzzing to effectively enhance function coverage while operating under resource constraints. Finally, the framework incorporates MCP-based static and dynamic analysis tools augmented with context engineering to facilitate complex vulnerability reasoning. The efficacy of FuzzingBrain V2 was demonstrated in formal testing and practical application. On the AIxCC 2025 Final Competition C/C++ dataset, the system achieved a 90 percent detection rate, successfully identifying 36 out of 40 vulnerabilities. In a real-world deployment scenario, the system proved highly valuable, discovering twenty-nine zero-day vulnerabilities across twelve open-source projects. Critically, these discovered vulnerabilities were subsequently confirmed and fixed by the project maintainers, resulting in two assigned CVE IDs. The work demonstrates the capacity of a multi-agent LLM system to automate complex vulnerability discovery and reproduction by integrating fine-grained analysis with robust verification mechanisms. |