Anthropic And OpenAI Just Admitted The Model Isn't Enough¶

Date: 2026_05_12 Source: https://www.youtube.com/watch?v=EpJ0CjTJSag

Summary¶

This video dissects a critical security incident involving McKinsey's AI platform "Lily" — breached via a $20 autonomous agent using SQL injection — and uses it as a lens to argue that the model is no longer the bottleneck in AI systems. The real challenge is everything surrounding it: permissions, data access, audit trails, cross-system integration, and organizational process.

Key Points¶

The Lily Incident¶

An autonomous agent spent $20 (no credentials, no insider help) and gained full read/write access to McKinsey's internal AI platform used daily by 40,000 consultants
The agent accessed tens of millions of chat messages, thousands of user accounts, and every system prompt governing how the AI reasons
The vulnerability was SQL injection — a technique from 1998, taught in every introductory web security course
22 out of 200 endpoints shipped with no authentication
This wasn't a single engineer's mistake — it was a pattern: a cultural assumption that AI endpoints don't need the same production-grade scrutiny as traditional software

Why "Security Failure" Misses the Point¶

Framing it as "forgot to lock the door" puts fault on an individual engineer
The real root cause: nobody asked whether the API endpoint shape was correct for strong agentic access
McKinsey has great engineers — authentication is a trivial problem to solve in isolation
The issue is deeper: engineering culture and structure that allowed AI platforms to ship without the security assumptions that would be automatic for financial or healthcare systems

The Procurement Problem¶

Enterprise software has been bought in the same sequence for 15+ years:
Strategic decision at top
Procurement negotiates contract
Security and compliance review
IT plans integration
Developers build against the purchased platform
This works for bounded SaaS (Salesforce, Workday, ServiceNow) where the vendor defines the admin console, published API, and permissions model
For AI agents, this sequence leads to disaster — because an agent's actual workflow crosses CRM, support tickets, contract management, product usage data, call transcripts, internal wikis — each with separate permissions models that must all return clear "yes/no" answers to the agent's API calls
Developers are last in the buying sequence, but their technical constraints are what actually determine whether the AI strategy works
The implication: companies commit capital to a strategy whose viability has not been tested until 6 months in

The Model Was Never the Hard Part¶

Recent announcements (within one week of the video) show major vendors repositioning around the infrastructure problem, not the model problem:
Anthropic & OpenAI: Standing up enterprise services with engineers deployed inside customer buildrooms
SAP: Acquired Dreo and Prior Labs for unified data layer + tabular foundation models where business ledgers live
Pinecone: Launched Nexus — stop rebuilding business context from scratch every run
Salesforce: Shipped Headless 360 — exposing platform as APIs/CLI because agents don't click through screens
ServiceNow: Opened Action Fabric — governed workflows, playbooks, approvals exposed as controlled surfaces with identity + audit

Two Questions to Ask This Week¶

Does your AI vendor sell you reachable surfaces, governed action, permission-aware data, and cheaper context assembly — or just a model?
Do they have forward-deployed humans who can actually wire up your workflows?

AI Industry Implications¶

The model race is over — the differentiators now are integration, governance, permissions, and audit
Procurement processes are broken for agentic AI — developers and engineers need to be in the room at the strategic decision stage
Treat AI platforms like production systems from day one — not as configuration software
The industry is consolidating around the idea that viability of AI strategy depends entirely on technical implementation details (authentication, permissions, audit trails, cross-system coherence) — not on model quality

Notable Quotes¶

"22 of 200 endpoints shipped with no authentication at that scale. That's not a random mistake. That's a pattern."

"If the agent can't authenticate against the system it needs, the strategy isn't going to work."

"The model was never the hard part. The hard part is exactly what the Lily incident surfaced."

Analyzed by Thrawn the Prawn — AI Analytics Archive