Scott Stevenson, cofounder of legal AI startup Spellbook, has accused hundreds of AI startups of inflating their annual recurring revenue (ARR) by conflating it with contracted ARR (CARR), a metric that includes uncertain future payments from pilots, opt-out clauses, or features not yet built, in a bid to attract venture capital amid a fiercely competitive funding landscape where perception often outweighs substance.
The ARR Mirage: How AI Startups Are Gaming the Metric That Matters Most to VCs
At the core of the controversy is a semantic sleight of hand: startups reporting CARR as ARR to inflate their top-line numbers. ARR, by definition, should reflect only recurring revenue that is contractually obligated and invoicable within the next 12 months—think live subscriptions with predictable renewal rates. CARR, however, extends that horizon to include multi-year deals where revenue recognition is contingent on deployment milestones, customer success, or even optional feature add-ons. Stevenson’s claim that the gap between reported ARR and actual invoiced revenue can reach 3x to 5x isn’t hyperbole; in early-stage AI verticals like legal tech or healthcare automation, where sales cycles stretch beyond six months and pilots are rampant, the distinction becomes a canyon. A startup might sign a 3-year, $1.8M CARR deal with a law firm but only invoice $150K in the first year due to phased rollout and optional AI modules—yet still report $600K ARR by annualizing the full contract value, ignoring cancellation clauses or build-out risks.
What This Means for Enterprise IT Buyers
For CTOs evaluating AI vendors, this metric manipulation creates a dangerous illusion of traction. A startup claiming $5M ARR might actually have less than $1M in live, renewable subscriptions—enough to mislead procurement teams into overcommitting to unproven platforms. Worse, when these inflated numbers permeate industry benchmarks, they distort pricing expectations and fuel a race to the bottom on contract terms, as startups rush to match phantom growth curves. As one enterprise architect put it during a recent OWASP AppSec panel:
We’ve seen startups win bake-offs based on ARR decks that look like SaaS unicorns, only to discover post-purchase that 70% of that revenue was contingent on features still in Jira backlogs. Due diligence now means asking for invoiced MRR breakdowns, not just slideware.

Ecosystem Fallout: How ARR Inflation Fuels Platform Lock-In and Distorts Open Source Incentives
The ripple effects extend beyond fundraising theatrics. When startups prioritize CARR-friendly accounting to appease VCs, they often architect their products around long-term, sticky enterprise contracts—favoring proprietary APIs, custom SLAs, and data silos that inhibit interoperability. This runs counter to the ethos of open-source AI tooling, where projects like Hugging Face’s Transformers or LangChain thrive on modularity and community-driven extensibility. A maintainer of an open-source LLM orchestration framework warned in a private GitHub discussion:
If every startup is incentivized to lock customers into 3-year CARR deals with bespoke feature flags, we’ll witness fewer contributions to shared standards. Why build a plug-in for a platform that might sunset its API in 18 months when the real revenue is tied to a custom model fine-tuning pipeline only they control?
This dynamic exacerbates platform fragmentation. Cloud providers like AWS and Azure benefit from the chaos, offering managed AI services that abstract away vendor-specific complexity—yet deepen dependency on their ecosystems. Meanwhile, truly open alternatives struggle to gain traction when startups’ inflated ARR narratives craft them appear more viable than they are, skewing investor attention away from sustainable, community-backed models.
The Silent Pact: Why VCs and Journalists Are Complicit in the Illusion
Stevenson’s allegation of a “silent pact” between founders and VCs finds echoes in private conversations with late-stage investors. One partner at a top-tier VC firm, speaking on condition of anonymity, admitted:
We know the CARR/ARR blur is widespread. But if we call it out, we risk damaging portfolio companies’ ability to raise the next round. So we look at cash flow and burn rate instead—and let the PR machine run with the headline number.
Journalists, lacking access to raw contract data, often rely on startup-provided metrics—a vulnerability Stevenson urges them to close. He recommends asking for three specific data points: monthly invoiced revenue over the last quarter, customer churn rate on active subscriptions, and the percentage of ARR derived from contracts with opt-out clauses under 90 days. Without this triangulation, media coverage inadvertently amplifies the very fiction it seeks to report.
Technical Underpinnings: Why AI Startups Are Especially Vulnerable to Metric Gaming
Unlike traditional SaaS, AI startups face unique pressures that make ARR inflation tempting. Training and deploying large language models require massive upfront compute spend—often optimized on NVIDIA H100 GPUs or TPU v5e pods—creating a cash flow gap between investment and revenue recognition. To bridge it, some startups pilot models in customer environments using spot instances or reserved capacity, then count the full annualized value of those pilots as ARR, even if the customer retains the right to terminate with 30 days’ notice. Others annotate CARR with revenue from foundation model API usage (e.g., OpenAI or Anthropic) that they merely resell, adding layers of indirection that obscure true margin.
This isn’t merely accounting creativity—it’s a symptom of a deeper misalignment between VC expectations and the realities of AI product-market fit. Founders are pressured to show hockey-stick growth in markets where enterprise adoption hinges on accuracy, compliance, and change management—not just API calls. Until investors adjust their benchmarks to reflect invoiced, renewable revenue rather than contracted potential, the cycle of inflation will persist, eroding trust in both AI startups and the metrics meant to measure them.
The takeaway is clear: in an era where AI hype outpaces hard revenue, the most valuable metric isn’t the biggest number on the slide—it’s the one you can actually invoice.