Microsoft unveiled MDASH (Multi-model Agentic Scanning Harness), a security system that orchestrates over 100 specialized AI agents to discover and validate software vulnerabilities. In its first major deployment, it found 16 new vulnerabilities in Windows networking and authentication — including four critical remote code execution flaws. The system achieved an industry-leading 88.45% score on the public CyberGym benchmark.
What Is Microsoft MDASH?
MDASH is a multi-model agentic scanning harness built by Microsoft’s Autonomous Code Security team. Unlike single-model approaches, it uses an ensemble of frontier and distilled AI models working together. These agents discover, debate, and prove exploitable bugs end-to-end, mimicking how a team of human security researchers would operate — but at machine speed.
How Well Did MDASH Perform?
The results were striking:
- 21 of 21 planted vulnerabilities found with zero false positives on a private test driver
- 96% recall against five years of confirmed MSRC cases in clfs.sys
- 100% recall in tcpip.sys
- 88.45% on the CyberGym benchmark of 1,507 real-world vulnerabilities — roughly 5 points ahead of the next entry
What Vulnerabilities Did It Find?
This month’s Patch Tuesday includes 16 CVEs found by MDASH, including:
- Critical remote code execution flaws in the Windows kernel TCP/IP stack
- Vulnerabilities in the IKEv2 service
- Unauthenticated remote code execution in DNS and Netlogon components
Key Takeaways
- MDASH orchestrates 100+ specialized AI agents across multiple models
- Found 16 real CVEs in Windows networking and authentication stack
- Industry-leading 88.45% on CyberGym benchmark
- Zero false positives on planted vulnerability tests
- Now being used by Microsoft security engineering teams
- Available in limited private preview for customers
Frequently Asked Questions
Is MDASH replacing human security researchers? No. MDASH augments human teams by automating vulnerability discovery at scale. Human researchers still validate and prioritize findings.
What models power MDASH? The system uses an ensemble of frontier and distilled models. Microsoft notes the surrounding agentic system contributes substantially to performance beyond any single model’s capability.
How can I access MDASH? It is currently being used by Microsoft security engineering teams and tested by a small set of customers as part of a limited private preview.