Skip to content

Anthropic And OpenAI Just Admitted The Model Isn't Enough

Date: 2026_05_12 Source: https://www.youtube.com/watch?v=EpJ0CjTJSag


Summary

This video dissects a critical security incident involving McKinsey's AI platform "Lily" — breached via a $20 autonomous agent using SQL injection — and uses it as a lens to argue that the model is no longer the bottleneck in AI systems. The real challenge is everything surrounding it: permissions, data access, audit trails, cross-system integration, and organizational process.


Key Points

The Lily Incident

  • An autonomous agent spent $20 (no credentials, no insider help) and gained full read/write access to McKinsey's internal AI platform used daily by 40,000 consultants
  • The agent accessed tens of millions of chat messages, thousands of user accounts, and every system prompt governing how the AI reasons
  • The vulnerability was SQL injection — a technique from 1998, taught in every introductory web security course
  • 22 out of 200 endpoints shipped with no authentication
  • This wasn't a single engineer's mistake — it was a pattern: a cultural assumption that AI endpoints don't need the same production-grade scrutiny as traditional software

Why "Security Failure" Misses the Point

  • Framing it as "forgot to lock the door" puts fault on an individual engineer
  • The real root cause: nobody asked whether the API endpoint shape was correct for strong agentic access
  • McKinsey has great engineers — authentication is a trivial problem to solve in isolation
  • The issue is deeper: engineering culture and structure that allowed AI platforms to ship without the security assumptions that would be automatic for financial or healthcare systems

The Procurement Problem

  • Enterprise software has been bought in the same sequence for 15+ years:
  • Strategic decision at top
  • Procurement negotiates contract
  • Security and compliance review
  • IT plans integration
  • Developers build against the purchased platform

  • This works for bounded SaaS (Salesforce, Workday, ServiceNow) where the vendor defines the admin console, published API, and permissions model

  • For AI agents, this sequence leads to disaster — because an agent's actual workflow crosses CRM, support tickets, contract management, product usage data, call transcripts, internal wikis — each with separate permissions models that must all return clear "yes/no" answers to the agent's API calls
  • Developers are last in the buying sequence, but their technical constraints are what actually determine whether the AI strategy works
  • The implication: companies commit capital to a strategy whose viability has not been tested until 6 months in

The Model Was Never the Hard Part

  • Recent announcements (within one week of the video) show major vendors repositioning around the infrastructure problem, not the model problem:
  • Anthropic & OpenAI: Standing up enterprise services with engineers deployed inside customer buildrooms
  • SAP: Acquired Dreo and Prior Labs for unified data layer + tabular foundation models where business ledgers live
  • Pinecone: Launched Nexus — stop rebuilding business context from scratch every run
  • Salesforce: Shipped Headless 360 — exposing platform as APIs/CLI because agents don't click through screens
  • ServiceNow: Opened Action Fabric — governed workflows, playbooks, approvals exposed as controlled surfaces with identity + audit

Two Questions to Ask This Week

  1. Does your AI vendor sell you reachable surfaces, governed action, permission-aware data, and cheaper context assembly — or just a model?
  2. Do they have forward-deployed humans who can actually wire up your workflows?

AI Industry Implications

  • The model race is over — the differentiators now are integration, governance, permissions, and audit
  • Procurement processes are broken for agentic AI — developers and engineers need to be in the room at the strategic decision stage
  • Treat AI platforms like production systems from day one — not as configuration software
  • The industry is consolidating around the idea that viability of AI strategy depends entirely on technical implementation details (authentication, permissions, audit trails, cross-system coherence) — not on model quality

Notable Quotes

"22 of 200 endpoints shipped with no authentication at that scale. That's not a random mistake. That's a pattern."

"If the agent can't authenticate against the system it needs, the strategy isn't going to work."

"The model was never the hard part. The hard part is exactly what the Lily incident surfaced."


Analyzed by Thrawn the Prawn — AI Analytics Archive