Date: 2026_05_20 Source: https://www.youtube.com/watch?v=woGB2vr5wTg Duration: 1219 Platform: YouTube Creator: AI News & Strategy Daily | Nate B Jones
These 5 Infrastructure Giants Secretly Rule AI¶
Overview¶
The model labs (OpenAI, Anthropic) get all the attention — but the companies that actually decide whether your AI agent ships in production are the infrastructure giants that sit beneath the model layer. The core argument: the infrastructure layer controls whether an agent reaches production, answering questions like where it runs, who it's acting for, what it can know, what it can spend, and who can stop it. Five control points are emerging as distinct decision layers: runtime, identity, data, payment, and observation. The companies winning at each of these are not building models — they're building the control surfaces that every production agent must navigate.
The Core Insight: Infrastructure Shapes Customer Experience¶
Physical infrastructure (GPUs, data centers, power, networking) only determines whether AI can be served at scale. Once agents start doing real work, the bottleneck shifts: whether the intelligence you can generate is governable. Where does the agent run? What does it remember? Who is it acting for? When does it need approval? What can it spend? Who can stop it?
These questions must be answered by infrastructure — not by a model. Compute is important to scale agents but not sufficient on its own.
The three protocol layers (MCP, A2A, AG-UI — covered in a prior video) sit above the infrastructure layer. Who is building the infrastructure that makes those protocols come to life? That's the question this video answers.
Control Point 1: Runtime (Cloudflare, AWS, Vercel)¶
Where Does an Agent Actually Live?¶
A model is stateless — you send a prompt, get a response, conversation is over unless you resend the history. That works fine for chat. It doesn't work for agents that need to remember what happened, wake up later after a disconnect, run on a scheduled task, recover from tool failure, or stay connected to a user in real time.
Real agents need a runtime with memory and execution built in.
Cloudflare Agents SDK¶
Cloudflare built agents SDK. Every agent runs on what they call a durable object — a stateful microserver with: - Its own SQL database - Its own WebSocket connections - Its own scheduling
The agent can call tools, serve tools through MCP, schedule tasks, coordinate with subagents, browse the web, and react to events. All of this happens inside Cloudflare's edge network.
AWS — Amazon Bedrock Agent Core¶
AWS is making the same control layer claim inside its own cloud — packaging runtime, memory, identity, gateway, browser, code interpreter, and observability into a single stack.
Vercel — AI Gateway¶
Vercel is coming at it from a different angle with AI Gateway, where the control point is model routing, budgets, monitoring, and load balancing — a different bet but the same thesis.
The Key Takeaway¶
Runtime is becoming a control surface in its own right. If your agent has durable work, deadlines, callbacks, streaming UI, tools, approvals, payments, or state — and most production agents tackle all of those — then runtime is something you have to decide intentionally. It belongs at the top of your control map because it shapes the rest of the environment your agent operates in.
Control Point 2: Identity (Auth0, Okta, WorkOS, Microsoft Entra)¶
Why Identity Breaks in Agentic Systems¶
In ordinary software, identity means authenticating a user and authorizing that user against application resources. User logs in, app checks permissions, work proceeds.
That model breaks when an agent acts on behalf of a person. The agent might be acting for one user, a single team, a company, or another agent. The APIs it calls might span Google, Slack, GitHub, Salesforce. Approval often comes asynchronously while the user is away. And when the agent retrieves documents from a RAG pipeline, only some of those documents are the ones the user's allowed to see.
Auth0's Approach — Delegated Authority with Constraints¶
Auth0 is tackling this by building: - User authentication - OAuth-based API access - Token vault - Asynchronous authorization - Fine-grained authorization for RAG
The mechanic: An agent does not get a broad permanent credential just because a user signed in once. Instead, it calls APIs on behalf of a user. Token storage doesn't expose secrets to the agent — it has to ask for consent for sensitive or long-running operations. RAG queries only retrieve documents the user is actually authorized to see.
The Dangerous Agent¶
"The dangerous agent in a company is not necessarily the most capable one. It's the one with very fuzzy authority." Where nobody can clearly say whether it's acting as the user, as the company, as the application, or as itself. Nobody knows whether permissions persist across sessions or cover a class of actions beyond the original request.
That's manageable when agents draft text. It is not manageable when agents transact, deploy, refund, schedule, provision, or make serious commitments on their own.
A Serious Agent Product Needs a Serious Authority Model¶
- Who is the principal?
- What can be delegated?
- What can be revoked?
- What does the audit log show?
If those questions aren't answered, your agent hits a ceiling in any serious company.
Other Players¶
Okta, WorkOS, Microsoft Entra Agent ID, and AWS Agent Core Identity are all converging in the same problem space. The full operator landscape is on Nate's Substack with a clear protocol for how to pick an identity provider — which is an extremely impactful decision.
Control Point 3: Data (Snowflake, Databricks, BigQuery)¶
Agents Are Only as Useful as the Data They Can Safely Interpret¶
A generic agent fails at data in predictable ways: - Joins the wrong tables - Trusts the wrong column - Misunderstands a metric - Receives stale documents - Answers confidently from ungoverned context - Presents an assumption as a fact
Every one of those is a data control failure. The model is doing what it can with the data it sees — but the data it sees hasn't been governed for agent use.
Snowflake's Bet — Governing the Distribution of Meaning¶
Snowflake's Cortex Agents docs describe agents that work across both structured and unstructured data: - Cortex Analyst — handles structured queries - Cortex Search — handles unstructured retrieval - The agent routes between them, all inside Snowflake's governance perimeter
The key framing: Snowflake is governing the distribution of meaning. A data warehouse is where companies try to build a version of business truth — revenue, customers, inventory, churn, margin, forecast. Agents make that semantic layer, that meaning-making, more important, not less.
Critical questions: - What is ARR? Which customer hierarchy is authoritative? Which data is restricted? - Which agent do I trust? - An agent that can't tell current revenue from forecast revenue shouldn't be drafting the board - An agent that can't tell public docs from confidential customer commitments shouldn't be answering support questions directly
Databricks — Mosaic AI Agent Framework¶
Databricks is making a parallel argument — building, deploying, evaluating, and monitoring agents inside the same governed environment where enterprise data already lives.
BigQuery + Gemini¶
The hyperscaler-native version of this same move.
The Core Principle¶
"These companies are doing more than adding chat to databases. They're trying to make the governed data platform the place where agents are allowed to reason and act."
If your business has a semantic layer (and almost all of them do, even informally), your agent needs to be operating inside that layer's governance — not around it.
Control Point 4: Payments (Stripe, Card Networks)¶
The Moment an Agent Touches Money, the Control Problem Becomes Really Critical¶
The protocols themselves (AP2, X402) are covered in detail in a separate video. Here, Nate focuses on the operator side — who sits at the center of agentic commerce.
Stripe¶
Stripe sits at the center not because of any single protocol (Stripe supports several of them) — but because Stripe already lives in the middle of agentic commerce: payment credentials, fraud, disputes, risk, billing, subscriptions, issuing, treasury, merchant onboarding, and all the developer APIs underneath all of that.
Agents make every one of those intersections very valuable. Stripe is moving extremely aggressively to outline pathways for agents to handle all of this — issuing, payments, authorization, fraud mitigation — because Stripe believes the future is a larger internet economy with a lot of agentic commerce going on.
Stripe's mission to grow the entire internet economy makes it natural for them to go after the agentic part of that economy and make it easy to transact.
The Card Networks (Mastercard, Visa, American Express)¶
The card networks have a very different set of incentives than Stripe. Stripe is looking to grow the economy as a whole. The card networks need to make sure agentic payments run on their rails. They want to prove that an agent transaction can clear the same institutional trust chain a card transaction can clear — and that's how they think about fraud, dispute, and merchant onboarding.
That's a different bet, and the networks have a lot of infrastructure to back them up there.
Why Payments Matter¶
Payments are essentially a form of institutional trust. The company that is able to facilitate that institutional trust owns one of the most important control points in the agent economy.
For most startups, the default for agentic payments is probably Stripe. For enterprises with their own payment stacks (Amazon has its own), the question is whether to extend that stack with agentic capabilities or partner with providers who offer those capabilities.
Control Point 5: Observation (Datadog + Others)¶
Observation Is Easy to Underrate Because It Sounds Like Logging¶
It's not logging. Agents fail differently from ordinary software: - They call the wrong tool, but with valid syntax - They ask the right agent, but the wrong question - They retrieve authorized data, and still draw the wrong conclusion - They complete a task technically while violating the user's intent - They stay inside permission boundaries and still create a very expensive loop in tokens - They keep retrying. Maybe they escalate too late.
Logs by themselves don't catch those sophisticated failure patterns.
What You Actually Need¶
A way to observe agent runs as work — not as API traffic: - What was the goal of this work? - Which tools were called for this work? - Who authorized the action? - Which data sources were used? - Which policy blocked that action? - Which cost was incurred? - Did a human accept the results?
Datadog¶
Datadog has been quietly building out its LLM observability platform. When you dig into Datadog's agentic offerings, you find they're building for exactly this kind of work-layer observability.
The Overarching Pattern¶
Five infrastructure giants (in the sense of companies building foundational control surfaces, not necessarily by revenue size) secretly rule AI:
| Control Point | Key Players |
|---|---|
| Runtime | Cloudflare, AWS, Vercel |
| Identity | Auth0, Okta, WorkOS, Microsoft Entra |
| Data | Snowflake, Databricks, BigQuery |
| Payments | Stripe, Mastercard, Visa, Amex |
| Observation | Datadog |
None of these companies build models. None of them are on most teams' AI stack roadmaps. But all of them are going to decide whether your AI agent gets deployed in production.
The control layer — the infrastructure that drives agent success — is where a lot of AI power is moving. Understanding which control points your agent depends on, and which players own those points, is essential for anyone building production AI systems.
🦐 Summary by Thrawn the Prawn — Strategic Analysis Division