Case studyNimbus SaaS (Postgres)May 10, 20264 min read

When Growth Drops 93%, Dashboards Point at Pricing. The Data Pointed at the Product.

A statistical driver analysis on Net New MRR, and why the first instinct is usually wrong.

Headline finding

92.6%

Net New MRR drop. Dashboards point at pricing. The data pointed at the product.

Net New MRR for the current period was $1,770. The baseline mean across the prior eight periods was $23,823. The drop, period over period, was 92.6 percent.

That is the kind of number that triggers a meeting at the end of the day, not the end of the week.

What the team did next determined where the next two weeks of work went. The default instinct in most companies is to ask "which customer segment broke?" The analyst opens the dashboard. They slice by plan tier. They slice by region. They slice by industry. They look for the customer subgroup that defected. They build a story around what they find.

The data MindPalace surfaced told a different story. Deep Analysis screened every available dimension in parallel and ranked them by effect size. What landed at the top was not what the team would have looked at first.

What the analysis found

Deep Analysis ran every available dimension on the data through a statistical screen. Thirty candidate dimensions were available. Fourteen passed the threshold worth investigating. Sorted by how strongly the metric varied across each dimension's groups:

DimensionEffect size
platform0.502
events_count0.500
source0.456
csat_score0.412
reason0.362
plan0.321
type0.301
platform0.502product engagement
events_count0.500product engagement
source0.456product engagement
csat_score0.412product engagement
reason0.362
plan0.321pricing
type0.301pricing
Effect size above 0.5 counts as a large effect
Top-ranked dimensions by effect size. Product engagement signals cluster at the top. Pricing signals sit at the bottom of the seven that survived screening.

Anything above 0.5 in this measure is considered a large effect. The two strongest signals, platform and events_count, are at exactly that line. Both are product-usage signals.

Plan tier ranked sixth. Subscription type ranked seventh. Pricing was not where the variance lived.

Four dimensions the team would probably have suspected first did not even make the cut: amount, completed status, category, and priority. The statistical screen flagged them as not meaningfully different between groups. If a human had built that dashboard manually, those four would have been on it. The data said do not bother.

What changes for the team

The question being asked at the end of the day changed.

The original question was "which customer segment churned?" That question implies a customer-success investigation. The team would have spent the next two weeks reviewing CSM notes, building a renewal-risk model, segmenting account health by tier.

The data pointed at a different investigation. The signal lived in product telemetry. The next two weeks of work belong to the product analytics team, not customer success.

The first question that team would ask: which platform values had the lowest events_count, and did those low-engagement cohorts overlap with the accounts that contracted? That is a query the analytics team can write before lunch. It is also not a question the customer-success investigation was ever going to produce.

That is a substantive operational change driven by a sixty-second statistical analysis instead of a two-week debate. Decision intelligence is the gap between those two outcomes.

How the system got there

The Living Map identified Net New MRR as the metric in question and its baseline window. Cartographer enumerated every available dimension across the related tables: accounts, subscriptions, sessions, product events, CSAT responses, and others. Thirty candidate dimensions came out of that enumeration.

Deep Analysis ran a two-stage statistical pass. Stage one tested each dimension on its own for whether the metric behaved differently across that dimension's groups. Fourteen dimensions passed. Stage two looked at pairs of those fourteen for combinations where the metric responded most. Forty-five of the 149 pair tests reached significance.

The whole thing ran in 70 seconds. No human picked the dimensions. No human wrote any of the SQL. There was no language model in the SQL path either, which is why the analysis is reproducible.

What the analysis honestly did not find

Cell-level drivers, the specific combinations of dimension values that produce the largest gap, did not hold up after correcting for multiple comparisons. The more tests you run, the more random false positives slip through. The correction accounts for that.

The dimension ranking is reliable. The exact "platform value X combined with events_count value Y is the worst combination" story is not. The engine surfaced this honestly. Confidence on cell-level drivers was assessed as medium. The output says "investigate these seven dimensions in order, ranked by effect size." It does not say "we found the answer."

That is the difference decision intelligence makes versus a dashboard with a single AI-generated insight line. The chatbot output would have read like an answer. The reader would have stopped looking. The honest output makes the team keep looking, just in the right place.

A note on the data

Nimbus is a representative B2B SaaS dataset built for product demos. The figures above are real outputs from the live engine running against that dataset on 2026-05-10. The numbers are not from a customer.

If you want to see what dimensions actually drive your metrics, request a demo. Both stages of the analysis run in about a minute.