Data Is Not Software
Anthropic's data agent makes the case: govern the metrics, and the SQL gets trivial.
Swarnim Shrey
Founder, MindPalace
Anthropic published a look at how they give their own employees self-service analytics with Claude. Read it next to OpenAI's data agent post and something gets hard to ignore. Two of the labs with the best models in the world both built the same thing, and it is not a chatbot. It is a governed semantic layer. Neither one pointed a model at a warehouse and let it answer.
The interesting part is not what they let Claude do. It is what they refuse to let it do.
The one sentence worth reading twice
Here is the line from Anthropic's post that matters:
If a question maps cleanly to a defined metric, the agent calls a function and gets one number, the same number every other surface in the company produces.
Read it slowly. The agent does not compute the number. It calls a function that already knows the answer. The model's only job is to figure out which function. The number itself comes from a governed definition that a human wrote and Finance signed off on.
That is the argument we made in why LLMs should never calculate your churn rate, now coming from the company that builds the model.
They named the hard problem, and it is not the SQL
Anthropic is blunt about where the difficulty actually lives:
The central problem comes down to our ability to map a user's question to specific and up-to-date entities in our data model and know the correct way of working with them. If we can do that, then the resulting execution and SQL becomes trivial.
The SQL is trivial. The mapping is the work.
This is the bet MindPalace made before we shipped a single AI feature. Cartographer reads your warehouse and builds the map: which tables are real, which columns mean what, how they join. Grounding binds canonical KPIs to your specific columns. By the time a question turns into SQL, every hard decision has already been made, deterministically. The math was never the hard part. The mapping was.
The mistake they warn you about, we made first
One line in Anthropic's post reads like a scar. They tried bootstrapping the semantic layer by having an LLM auto-generate the metric definitions, and it produced plausible-looking definitions that encoded the very ambiguities they were trying to eliminate.
We learned the same thing the same way. An LLM asked to define churn will hand you a definition. It will look right. It will quietly bake in a dozen choices about trials, involuntary churn, and billing periods that nobody agreed to. Plausible and wrong is worse than blank, because plausible and wrong survives the meeting.
So MindPalace does not ask a model to invent what your metrics mean. Grounding starts from canonical, industry-standard definitions and binds them to your columns. A person reviews every binding before it goes live. The model proposes. A human signs off. The definition is never the model's opinion.
Data is not software
Anthropic's framing for why analytics is different from coding is three words: data is not software. In code, a model exploring a wide solution space is an asset. In analytics, there is one correct answer to "what was revenue last quarter," and creativity is the bug.
That is why our engine is deterministic from end to end. The same question compiles to the same query plan, content-hashed, and returns the same number on Monday and on Friday. Not within five percent. The same number. A language model sits at the very end, turning that number into a sentence a person can read. It never touches the number.
The uncomfortable number
Anthropic runs about 95 percent of their business analytics queries through Claude, at roughly 95 percent accuracy in aggregate. The accuracy does not come from a smarter model. When they tested an LLM-generated semantic layer against a smaller, human-curated one, the human-curated layer won on their evals. The governance is the accuracy.
That should stop every team currently wiring an LLM to their warehouse and hoping the next model fixes the wrong answers. The next model will not fix them. Anthropic has the model and still built the governance layer. So did OpenAI. The model is not the part you are missing.
Where the semantic layer stops
OpenAI and Anthropic both solved for the question: answer this accurately. There is a layer past that. Not just what happened, but why it moved, who owns it, and what to do next. That needs to know how metrics connect, not just how tables join. It is the difference between a Decision Context Graph and a semantic layer, and it is what we are building on the foundation both labs just validated.
Two labs, the same architecture, each arrived at it on their own. Govern the metrics. Keep the model out of the math. If you want the determinism argument in full, read why LLMs cannot do math. If you want the context argument, read what OpenAI's data agent teaches. If you want to see it running on real data, come talk to us.
Read this next
What is a Decision Context Graph? An Architectural Guide
A Decision Context Graph is the missing layer between your warehouse and your decisions. Here is what it is, how we build one in four hours, and why it matters now.
What OpenAI's Data Agent Teaches About AI-Native BI
OpenAI's internal data agent looks like a chatbot. Inside, it is a context graph. Here is what AI-native BI actually requires, and what most teams will miss.
Why LLMs Should Never Calculate Your Churn Rate
Most AI-native BI tools let a language model calculate your business metrics. That is a category error. Here is the architecture we built instead, and why.