Microsoft MDASH: Multi-Model AI Security System Finds 16 Zero-Day Vulnerabilities

Microsoft unveiled MDASH (Multi-model Agentic Scanning Harness), a security system that orchestrates over 100 specialized AI agents to discover and validate software vulnerabilities. In its first major deployment, it found 16 new vulnerabilities in Windows networking and authentication — including four critical remote code execution flaws. The system achieved an industry-leading 88.45% score on the public CyberGym benchmark.

What Is Microsoft MDASH?

MDASH is a multi-model agentic scanning harness built by Microsoft’s Autonomous Code Security team. Unlike single-model approaches, it uses an ensemble of frontier and distilled AI models working together. These agents discover, debate, and prove exploitable bugs end-to-end, mimicking how a team of human security researchers would operate — but at machine speed.

How Well Did MDASH Perform?

The results were striking:

21 of 21 planted vulnerabilities found with zero false positives on a private test driver
96% recall against five years of confirmed MSRC cases in clfs.sys
100% recall in tcpip.sys
88.45% on the CyberGym benchmark of 1,507 real-world vulnerabilities — roughly 5 points ahead of the next entry

What Vulnerabilities Did It Find?

This month’s Patch Tuesday includes 16 CVEs found by MDASH, including:

Critical remote code execution flaws in the Windows kernel TCP/IP stack
Vulnerabilities in the IKEv2 service
Unauthenticated remote code execution in DNS and Netlogon components

Key Takeaways

MDASH orchestrates 100+ specialized AI agents across multiple models
Found 16 real CVEs in Windows networking and authentication stack
Industry-leading 88.45% on CyberGym benchmark
Zero false positives on planted vulnerability tests
Now being used by Microsoft security engineering teams
Available in limited private preview for customers

Frequently Asked Questions

Is MDASH replacing human security researchers? No. MDASH augments human teams by automating vulnerability discovery at scale. Human researchers still validate and prioritize findings.

What models power MDASH? The system uses an ensemble of frontier and distilled models. Microsoft notes the surrounding agentic system contributes substantially to performance beyond any single model’s capability.

How can I access MDASH? It is currently being used by Microsoft security engineering teams and tested by a small set of customers as part of a limited private preview.