Microsoft’s New AI Can Outdiagnose Doctors—But Should It?
Satya Nadella thinks AI might be the next big thing in medicine. Or at least, that’s the impression you’d get from his latest announcement. This week, the Microsoft CEO shared details about MAI-DxO, a system that acts like a team of virtual doctors working together to crack tough medical cases.
The numbers are striking. In tests against 304 complex cases from the *New England Journal of Medicine*, the AI got it right 85.5% of the time. A group of 21 real doctors, each with years of experience? They only managed 20%. That’s a big gap, though it’s worth noting these weren’t your everyday checkup scenarios—they were the kind of cases that make even specialists pause.
How It Works: A Digital Medical Team
MAI-DxO doesn’t just spit out answers. It mimics how doctors actually think. There’s no multiple-choice quiz here—instead, the AI starts with limited info, asks follow-up questions, orders tests (with virtual costs attached), and adjusts its theories as it goes.
Microsoft built it like a medical council, with different AI models playing different roles. One acts as the skeptic, another keeps costs in check, and another double-checks the logic. It’s messy, just like real medicine. And somehow, it works. In one test, the system was 80% accurate while spending *less* than human doctors typically would.
But here’s the catch: at its best, the AI hit 85.5% accuracy—but the cost ballooned to over $7,000 per case. That’s not exactly cheap, though still slightly better than some standalone AI models.
The Bigger Picture
Microsoft isn’t the first to try this. AI has been flirting with medicine for decades, from Stanford’s 1970s bacterial infection tool to Google’s recent chatbot-for-doctors experiment. What’s different here is the teamwork angle—and the fact that MAI-DxO isn’t tied to one specific AI model. It boosted accuracy across the board, whether using OpenAI, Google, or Meta’s systems.
Still, the researchers behind it are careful. They call it a “research demonstration,” not a finished product. Before this tech touches real patients, it’ll need clinical trials, safety checks, and regulatory approval. That could take years.
For now, the bigger question isn’t whether AI *can* diagnose better than humans—it’s how we’ll use it without losing the human touch. Microsoft insists AI won’t replace doctors, just help them. After seeing those 20% scores from the human physicians, though, you’ve got to wonder: how much help do we actually need?
