Architectural Guide

Data Is Not Software

Anthropic's data agent makes the case: govern the metrics, and the SQL gets trivial.

SS

Swarnim Shrey

Founder, MindPalace

June 8, 20264 min read

Anthropic published a look at how they give their own employees self-service analytics with Claude. Read it next to OpenAI's data agent post and something gets hard to ignore. Two of the labs with the best models in the world both built the same thing, and it is not a chatbot. It is a governed semantic layer. Neither one pointed a model at a warehouse and let it answer.

The interesting part is not what they let Claude do. It is what they refuse to let it do.

The one sentence worth reading twice

Here is the line from Anthropic's post that matters:

If a question maps cleanly to a defined metric, the agent calls a function and gets one number, the same number every other surface in the company produces.

Read it slowly. The agent does not compute the number. It calls a function that already knows the answer. The model's only job is to figure out which function. The number itself comes from a governed definition that a human wrote and Finance signed off on.

That is the argument we made in why LLMs should never calculate your churn rate, now coming from the company that builds the model.

They named the hard problem, and it is not the SQL

Anthropic is blunt about where the difficulty actually lives:

The central problem comes down to our ability to map a user's question to specific and up-to-date entities in our data model and know the correct way of working with them. If we can do that, then the resulting execution and SQL becomes trivial.

The SQL is trivial. The mapping is the work.

This is the bet MindPalace made before we shipped a single AI feature. Cartographer reads your warehouse and builds the map: which tables are real, which columns mean what, how they join. Grounding binds canonical KPIs to your specific columns. By the time a question turns into SQL, every hard decision has already been made, deterministically. The math was never the hard part. The mapping was.

The mistake they warn you about, we made first

One line in Anthropic's post reads like a scar. They tried bootstrapping the semantic layer by having an LLM auto-generate the metric definitions, and it produced plausible-looking definitions that encoded the very ambiguities they were trying to eliminate.

We learned the same thing the same way. An LLM asked to define churn will hand you a definition. It will look right. It will quietly bake in a dozen choices about trials, involuntary churn, and billing periods that nobody agreed to. Plausible and wrong is worse than blank, because plausible and wrong survives the meeting.

So MindPalace does not ask a model to invent what your metrics mean. Grounding starts from canonical, industry-standard definitions and binds them to your columns. A person reviews every binding before it goes live. The model proposes. A human signs off. The definition is never the model's opinion.

Data is not software

Anthropic's framing for why analytics is different from coding is three words: data is not software. In code, a model exploring a wide solution space is an asset. In analytics, there is one correct answer to "what was revenue last quarter," and creativity is the bug.

That is why our engine is deterministic from end to end. The same question compiles to the same query plan, content-hashed, and returns the same number on Monday and on Friday. Not within five percent. The same number. A language model sits at the very end, turning that number into a sentence a person can read. It never touches the number.

The uncomfortable number

Anthropic runs about 95 percent of their business analytics queries through Claude, at roughly 95 percent accuracy in aggregate. The accuracy does not come from a smarter model. When they tested an LLM-generated semantic layer against a smaller, human-curated one, the human-curated layer won on their evals. The governance is the accuracy.

That should stop every team currently wiring an LLM to their warehouse and hoping the next model fixes the wrong answers. The next model will not fix them. Anthropic has the model and still built the governance layer. So did OpenAI. The model is not the part you are missing.

Where the semantic layer stops

OpenAI and Anthropic both solved for the question: answer this accurately. There is a layer past that. Not just what happened, but why it moved, who owns it, and what to do next. That needs to know how metrics connect, not just how tables join. It is the difference between a Decision Context Graph and a semantic layer, and it is what we are building on the foundation both labs just validated.

Two labs, the same architecture, each arrived at it on their own. Govern the metrics. Keep the model out of the math. If you want the determinism argument in full, read why LLMs cannot do math. If you want the context argument, read what OpenAI's data agent teaches. If you want to see it running on real data, come talk to us.

Read this next