Skip to content

Date: 2026_05_13 Source: https://www.youtube.com/watch?v=lqiwQiDglGk Duration: 1208 Platform: YouTube Creator: AI News & Strategy Daily | Nate B Jones


Pinecone Just Demoted Vector Search. Here's the Knowledge Layer.

Overview

The video explores a significant shift in the AI infrastructure landscape: major vendors are moving beyond simple vector search for AI agents, instead building what the industry is calling a "knowledge layer." The host, Nate B Jones, argues that classic RAG (Retrieval Augmented Generation) built for chatbots is insufficient for production agent workloads, and that infrastructure vendors are racing to solve an agent memory problem. The video covers four major recent moves—Pinecone's Nexus, Page Index, SAP's acquisitions (Dremino + Prior Labs)—and concludes with three actionable steps for teams building agents today.


The Agent Memory Problem

Why Classic RAG Fails Agents

Classic RAG, the dominant pattern of 2024-2025, was designed for chatbot-era workloads: a user asks a question, the system finds three semantically similar text chunks, and the model writes a paragraph. This works for FAQs but breaks down for agents that:

  • Run multi-step tasks (open ticket → retrieve customer record → check policy → draft response)
  • Cross-reference definitions across 40-page contracts
  • Need accurate, text-perfect answers—not approximate matches
  • Consume up to 85% of agent compute on rediscovery—re-reading documents previously summarized, re-fetching context already retrieved, re-asking questions already answered

The Rediscovery Problem

Pinecone states that rediscovery can consume up to 85% of agent compute. In practice, agents: 1. Refetch the same context every run 2. Re-summarize documents from prior runs (correctly or not) 3. Ask the user for information the system already has 4. Blow the token budget before useful work starts 5. Lose consistency across runs

Context Windows Don't Fix This

Larger model context windows (e.g., 1M tokens) help but don't solve the core problem. Chroma research shows model performance degrades as context grows more cluttered. The issue isn't whether the right answer exists somewhere in the context—it's whether the right answer is:

  • Presented in a usable form
  • Marked as authoritative vs. inferred
  • Distinguishable from stale information
  • Governed by access controls

The goal is not "maximum context" but "appropriate context."


The Four Major Infrastructure Moves

1. Pinecone — Nexus with NoQL

What they shipped: A product called Nexus with a query language called NoQL.

The pitch: Agents need a different retrieval contract than chatbots do. A chatbot needs "related text." An agent needs operating context—the customer record, entitlement, controlling policy, and prior history assembled into a usable bundle.

Failure mode illustrated: When an agent prepares a customer escalation, it shouldn't search five different systems from scratch every time. When doing financial analysis, it shouldn't answer from whichever paragraph sits closest to the query vector—it needs to know whether the source of truth is the filing, the governed table, the metric definition, the prior forecast, or the live dashboard. Those are five different answers.

NoQL's bet: Retrieval should carry intent, filters, access policy, provenance, response shape, confidence, and budget—not just similarity scores. A vector database can power part of that but doesn't define the whole job anymore.

Key insight: Even Pinecone—whose entire business is built on vector search—is publicly acknowledging that vector search alone is insufficient for agents.


2. Page Index — Structure Over Semantics

The sharper claim: Many documents should never be chunked into vector embeddings because the document structure carries meaning that semantic search destroys.

Example — Financial filings: - Risk factors section ≠ Management discussion ≠ Notes to financial statements ≠ Narrative summary - A table is not interchangeable with a paragraph - Retrieving three "semantically similar" chunks from a 10-K can miss the clause that actually controls the answer, because that clause's meaning lies in its position within the document hierarchy, not in its semantic content

Example — Contracts: - A clause can look semantically relevant to a query - But the definition section can completely change what that clause means - A schedule can overwrite a general term - An exception can sit 40 pages from the triggering paragraph - Chunk retrieval finds text that "looks and sounds right" while losing the legal structure that makes it correct

Page Index's approach: - Builds a hierarchical tree of the document (like a table of contents with summaries at every node) - Model reasons through the tree to find the right section - No embeddings on the document. No vector similarity. - Claims 98.7% accuracy on FinanceBench (a finance evaluation benchmark)

Durable principle from Page Index: The retrieval unit needs to match the work you're doing: - Chunk works for simple FAQs - Section works for filings - Table works for financial analysis - Customer record works for support - Graph neighborhood works for dependency reasoning - Compiled brief works for repeated workflows

Better embeddings in classic RAG approaches don't fix this—they only find more relevant text.


3. SAP — Dremino + Prior Labs (€1B+ in AI infrastructure bets)

What SAP acquired:

Dremino: - Lakehouse architecture - Semantic layer - Query federation across SAP and non-SAP systems - Access controls - Lineage

Prior Labs: - Tabular foundation models (lead model: TabPFN, published in Nature) - SAP put more than €1 billion behind both bets combined

Why SAP did this: Most enterprise knowledge doesn't live in the PDFs that vector search is designed for. It lives in: - ERP systems - CRM records - Customer records - Governed tables

A huge slice of enterprise knowledge is tabular and structured, and the chatbot RAG playbook of "index a PDF and answer from a paragraph" is the wrong abstraction for that world.

The enterprise agent failure mode: If a procurement agent needs a revenue number, the source of truth is the governed table in the warehouse with a specific metric definition—not an indirect knowledge source. If it needs supplier risk, it's the supplier record plus the risk model. Getting these from approximate text retrieval is unacceptable in production operations where wrong answers = real money leaving the door.

SAP's bet: Own the enterprise data layer. Dremino gives them governed access to business data across systems with permissions and lineage baked in. When a procurement agent answers, it knows: - It's allowed to see the data - Where it came from - How the metric was defined - Whether the answer is fit for the action it's about to take

Prior Labs bet — TabPFN: Tabular foundation models exist because turning a spreadsheet into text and asking a language model to reason over it is the wrong abstraction. You can't reliably understand churn risk, supplier risk, or renewal forecasting from text derived from spreadsheets. Agents need to reason over tables as tables.

Combined thesis: Agents need knowledge in the shape the business uses—sometimes a document, sometimes a table, sometimes a metric definition, sometimes a workflow state. A serious knowledge layer respects those shapes as core rather than flattening everything into prose.


4. Microsoft + Graph RAG

The relational case: Some agent work is fundamentally relational at its core: - Which suppliers connect to which shipments - Which customers share a particular failure pattern - Which incidents trace back to the same root cause

These are graph questions mathematically. Chunks don't carry relationships, and tables don't either.

Microsoft's Graph RAG is the most prominent attempt to handle this, though it has real weaknesses: - Expensive to build and maintain - Entity extraction isn't perfect yet - Graphs can go stale - Entity relationships can encode bad patterns if underlying data is dirty

But the reason it keeps coming back is that some knowledge is naturally relational, and no other approach captures it.


The Four Shapes of Knowledge

The industry is racing to support at least four distinct shapes:

Shape Example Solution
Fuzzy prose Help center docs, articles Vector search + document trees
Long structured documents Contracts, filings Page Index (document hierarchy)
Business data in tables ERP, CRM, governed tables Dremino semantic layer + Prior Labs tabular models
Relationships Supply chain, incident correlation Graph RAG

The real choice isn't "database X vs. Y"—it's which of these shapes your agent needs and how you assemble them effectively.


Context Windows Don't Solve This

The video directly addresses the "just use a bigger context window" objection:

What larger context helps with: - More room to work - More information within reach

What it doesn't help with: - Deciding what belongs in the context - Marking which source is authoritative - Enforcing permissions - Preserving document hierarchy - Distinguishing memory the user confirmed from memory the model inferred

This is the context rot problem. Chroma's full context research (published) shows model performance degrades as context gets larger and more cluttered. The problem is never only "is the right answer somewhere in there"—it's "is the right answer presented in a form the model can actually use reliably?"


Three Steps for Building Agents Today

Step 1: Define the Contract Before Picking the Database

Don't start with a vendor (Pinecone, Weaviate, Neo4j, Chroma, etc.) and then figure out what to store. That's backwards.

Instead: Start by asking: What does this agent need to receive, in what form, to do its job reliably?

The database is determinative of the shape of what you retrieve. If you pick the database before you know what your agent needs, you're constraining the agent to whatever the database is good at.

The contract is the answer to: What does this agent need to receive in what form to do its job reliably?


Step 2: Write Down the Bundle, Not "Relevant Context"

"Relevant context" is too vague. Write down the specific bundle the agent needs for its task.

Example — Customer support refund agent: - Customer record - Plan / tier - Region - Product version - Purchase history - Applicable refund policy - Refund threshold - Prior exceptions for this customer - Current ticket - Approved response language - Whether the agent can issue the refund or only draft a recommendation

That bundle—every field in it—represents explicit choices: - Where does this come from? - Who's allowed to see it? - Is the source authoritative or just relevant? - How fresh does it need to be? - What happens if it's missing?

When you write the bundle down, three things happen: 1. You realize most fields don't live in one system 2. You realize some need to be governed, not just retrieved 3. You realize the agent's actual work is assembling and reasoning over the bundle, not just searching for docs


Step 3: Choose Primitives That Deliver the Bundle

Now you can go shopping—intentionally: - If the bundle is mostly prose → vector search + document trees - If it's mostly governed business data → semantic layer + tabular reasoning - If it's relational → graph - Most real agents need a mix—and that's fine

The point: Choose primitives because they deliver your bundle, not because they trended on LinkedIn last week.

Pinecone, Page Index, Dremino, Graph RAG—they're not competing for the same slot. They're each solving for one of the underlying shapes. Once you know the contract your agent needs, the choice stops being a debate and starts feeling like a thoughtful engineering decision.


Failure Modes to Watch

  • Compiled bundles go stale — pre-computed context can become outdated
  • Graphs encode bad relationships — dirty underlying data corrupts the graph
  • Document parsers miss tables — losing structured data in unstructured processing
  • Semantic layers get politically contested — in most companies, "source of truth" is an organizational fight
  • Agents store inference as fact — agents can accumulate bad conclusions by treating their own previous inferences as confirmed facts across runs
  • Overbuilding — a simple help center assistant doesn't need Graph RAG + document tree + semantic layer + memory system. Pick the simplest number of layers your agent needs and no more.

Key Metrics to Track

The cheapest place to learn what your agent needs is your own work logs: - How many retrieval calls happen before useful work starts? - How often does the agent open the same sources repeatedly? - How much of your token budget is consumed by raw context that wasn't actually needed? - How often does the agent ask the user for something the system already has? - How often does the next run rediscover what the prior run learned?

The pattern is in your existing agent runs. If you look, you'll find the rediscovery problem hiding there.


Summary

The memory era represents a fundamental shift in AI infrastructure: - Classic RAG was built for chatbots (Q&A, semantic search, ~3 chunks) - Agent workloads need knowledge in the shape of work (bundles, not context windows) - Every serious vendor is racing to solve this: Pinecone (Nexus/NoQL), Page Index (document trees), SAP (Dremino + Prior Labs), Google (Cloud Next knowledge architecture), Cloudflare (memory for agents), Microsoft (Graph RAG) - The teams that win won't be the ones chasing the trendiest retrieval—they'll be the ones who took time to define the contract before picking the database

The memory wars are not about which vector database wins. They're about which infrastructure approach lets agents do reliable work in production.


🦐 Summary by Thrawn the Prawn — Strategic Analysis Division