Token Budget Wars: Enterprise AI Enters the "Era of Cost Accounting"

区块律动BlockBeats

特邀专栏作者

2026-05-28 12:00

This article is about 4591 words, reading the full article takes about 7 minutes

AI Costs, ROI, and Internal Resource Allocation in Enterprises

AI Summary

Expand

Core Insight: Enterprise AI is shifting from "whether to adopt" to "how to calculate the costs," with the core contradiction being the difficulty of directly linking Token costs to business value. The next critical phase is not about model capability but about accurately attributing Token consumption to specific business outcomes to determine AI resource allocation.
Key Factors:
1. AI inference costs have transitioned from experimental budgets to ongoing operational expenditures, with CEOs and CFOs demanding quantification of the actual value generated by each dollar spent on Tokens.
2. Token consumption does not equal value: Due to factors like prompt engineering, context length, model selection, and retry attempts, the cost for the same workflow can vary by 5 to 10 times.
3. Marginal Token utility is a core metric, referring to the business value created for each additional dollar of inference cost—but most companies currently cannot track it.
4. AI budget requests fundamentally compete with labor costs. Replacing BPO (Business Process Outsourcing) provides an easier basis for quantitative benchmarking than replacing internal employees.
5. The long tail of retries, context bloat, and improper routing are the three main drivers of uncontrolled Token costs, significantly altering the economic equation.
6. Due to the missing attribution from Tokens to outcomes, enterprises need to capture the decision-making trajectory of Agents to explain "why" a specific workflow succeeded or failed.
7. Companies that master attribution capabilities can make allocation decisions (e.g., workflow optimization, model switching) and ultimately control the flow of internal AI resources within the enterprise.

Original title: Token Budget Wars

Original author: Jaya Gupta

Translation by: Peggy

Editor's note: Enterprise AI is transitioning from "whether to adopt" to "how to calculate the budget."

Over the past two years, many companies have pushed employees to use AI, more to keep up with technological trends and competitive pressures. But as AI inference costs shift from experimental budgets to ongoing operational expenses, CEOs and CFOs are starting to ask a more realistic question: How much value does AI actually create? For every dollar spent on token costs, what tangible results are obtained?

This is the core of the "Token Budget Wars." The so-called token budget war isn't just about companies wanting to lower their AI bills; it's about reassessing which business areas deserve more compute power, which tasks should be switched to cheaper models, which processes can replace outsourcing or human labor, and which are simply ineffective consumption.

The most noteworthy point of the article is that AI usage does not equal value. In the SaaS era, usage typically meant the software was being adopted. But in the AI era, token consumption only indicates "the meter is running." The same workflow can result in vastly different costs due to variations in prompts, context, model selection, and retry attempts. A higher bill could mean AI is genuinely working, or it could mean the system is spinning its wheels fruitlessly.

Therefore, the next phase of enterprise AI isn't just about model capability; it's about correlating token costs with business outcomes. The first phase proved AI can do the job; the second phase must answer: Is this work worth paying for?

Below is the original text:

Enterprise AI has shifted from 'whether to adopt' to 'how to allocate.'

In the C-suite, the new "currency" is your ability to quantify the ROI of AI investments. Every functional department is asked the same question: What did you produce? What was the cost? Over the past two years, CEOs woke up watching Jim Cramer on CNBC (#bearish), saw competitors announcing productivity gains, and demanded everyone across the company use AI. What's truly applying pressure now is the follow-up question: Show me the value.

Claude was released in November 2025, when most companies' 2026 annual budgets were already locked. By Q1, actual usage had far exceeded original plans. Inference costs were no longer just a budget line item for experimentation; they had become a recurring operational expense. This brought a new question: Where exactly is AI creating real value?

This question is hard to answer because the utility of tokens hasn't been quantified. The bill doesn't tell you whether this expenditure replaced labor, generated revenue, reduced risk, accelerated processes, or was just a group of engineers burning tokens for leaderboard rankings (#metamates). When spending is in the low hundreds of thousands of dollars, it still looks like an experiment. But beyond a certain threshold, say seven figures, it becomes infrastructure. Technical differences start materially impacting the P&L: the same workflow with the same inputs can cost 5 to 10 times more in tokens across two runs, without any apparent issue. This volatility is expensive at an experimental scale, but once it reaches infrastructure scale, it becomes a number the CFO must explain to the CEO.

Call it "marginal token utility": the business value created by each additional dollar spent on inference costs. This is the truly important metric during the scaling phase, and it's the metric most companies currently can't see.

The question in the boardroom is shifting from "Is AI useful?" to "Where exactly does AI create leverage?" This is why the so-called token budget battle is essentially a fight over the right to allocate tokens.

The fight over token ownership is heating up because it collides with a three-decade-old executive instinct: big teams mean a big position, a big scope of responsibility, and more power. In the past, the visible mark of a senior manager's success was the size of the team they managed – direct reports, skip-level reports, and the headcount in the organizational chart.

But when intelligence becomes a scarce resource, the new symbol becomes: how much intelligence can you marshal?

AI spending is essentially competing with labor costs.

Most AI budget requests boil down to one of three claims: replacing outsourced labor, replacing internal labor, or creating new revenue.

An employee has a salary. A BPO outsourcing contract has a price per ticket, claim, invoice, or audit. Humans understand these units of measurement. But inference costs are more complex because the final cost to complete a task depends on how the system runs during execution. A claim task requiring three retries, human corrections, and a frontier model might cost more than the outsourced labor it was intended to replace. This is why the discussion is shifting towards: What is the cost to achieve a result? For example, cost per resolved ticket, per processed claim, per reviewed contract, per completed invoice, per new role avoided, per customer retained, or per dollar of revenue generated.

Executives have realized BPO is the easiest place to establish a baseline because this work is already priced per "completion unit." In contrast, comparing AI to internal employees is much harder because employees do many things daily, including browsing TikTok during lunch breaks; productivity gains often manifest as avoided hiring or released capacity; and managers resist cutting teams based solely on partial automation. BPO provides a quantifiable baseline for business teams.

This differs from SaaS logic. SaaS trained companies to treat usage as a proxy for value.

But AI breaks this. The amount of inference resources a workflow consumes can vary dramatically based on prompts, retrieved context, chosen model, called tools, retry counts, and whether the agent gets stuck. The unit on the bill – the token – is stable, but the work it represents is not.

More precisely: signal and noise use the same unit of measurement. A rising token bill might mean real work is being done; but it might also mean compute power is being wasted on bad prompts, irrelevant context, unnecessary tool calls, repeated reasoning, and overpowered models. Two companies could have identical token bills, but vastly different underlying operations: one is converting inference into results, the other is paying for ineffective cycles, and both look identical on the invoice line items.

SaaS usage tells you: the software has been adopted. AI usage only tells you: the meter is running. It doesn't tell you if the company is actually moving forward.

Why is marginal token utility so hard to see?

There are three main reasons.

First is the long tail of retries. If the probability an agent correctly completes a workflow on the first try is p, the expected token consumption per resolved workflow roughly expands by T/p, where T is the base cost. If the completion rate drops from 90% to 70%, the effective cost per solved problem increases by about 28%, not 20%, because failures compound. In enterprise workflows, inputs are messy and edge cases matter. Failures don't just lower accuracy; they change the economics.

Second is context bloat. For operations heavily reliant on attention mechanisms, inference costs roughly grow as O(n²) with context length. So doubling the context length roughly quadruples inference costs. Everyone wants the model to have enough information, so systems tend to over-supply: retrieving fifty documents when five would do; connectors dumping entire email threads; agents carrying stale conversation histories.

Third is routing. When teams don't know which model is "good enough," they default to using the strongest one. A basic classification task might run on the same model designed for complex reasoning. At multi-million call volumes, the difference between routing simple tasks to a small model versus routing everything to a frontier model is often the difference between a controllable bill and a board-level problem.

Non-software industries will feel this pain as a "transformation." Software companies will see the problem first because the work being optimized is already well-instrumented. Engineering teams have metrics like PRs, commits, deploys, incidents, cycle time, MTTR, and these connect to the product. Though imperfect, such work is easier to measure.

Non-software companies will feel this problem more acutely because their work is operational. For example, claims, underwriting, customer service tickets, compliance reviews, supply chain exceptions, payment disputes. Or companies with real-world assets face the same issues. These workflows have traditionally been measured by human effort, cycle time, SLA achievement rates, and error rates, and often have higher requirements, needing to be defensible in an audit, not just correct on average. The unit of work and the unit of cost don't speak the same language, and they aren't in the same organization. The tech team sees token consumption, the business department sees workflow changes, but connecting the two requires multiple teams to first agree on "what exactly are we measuring."

I believe software companies will experience the token budget war as a productivity measurement problem, mirroring the many "AI layoffs" we've seen; non-software companies will experience it as a transformation problem.

The missing layer is attribution from tokens to outcomes. Enterprises need a conversion layer that links inference spending to the work completed and the business results generated. This layer must answer three questions: What is the real cost of this workflow, including retries and corrections? Which parts of the agent's execution trace are truly important, and which are just wasted cycles? Has this work changed the operating model – like fewer tickets per agent, shorter claims cycles, smaller BPO budgets, or delayed hiring? The next layer is outcome attribution in business language. Not just saying "this workflow cost $2.13," but rather: "This type of claim is cheaper when handled by an agent than by BPO, but if the policy requires additional exception documents, the long tail of retries destroys the economics."

Measurement will become memory. To link a token to an outcome, a company must capture everything that happens in between: what the agent saw, what it retrieved, what tools it called, what it overlooked, where it retried, where it was overridden by a human, which exception rule was applied, which precedent was relevant, and why one path succeeded while another failed. The measurement layer must record the decision trace, and this is something companies almost never truly had before. Record systems capture *what* happened, but rarely *why*. For instance, a CRM can tell you a deal was delayed, but not the undocumented judgments behind the sales forecast.

Decision rationale is one of the most easily corrupted and disappearing assets in a company because it lives in Slack threads, email chains, escalation meetings, and people's heads. But the problem is, people leave, and processes change.

AI changes this because agents generate traces. Every retrieval, tool call, retry, escalation, human correction, and final decision becomes part of the path from context to action to outcome. Initially, companies will capture these traces to justify spending. But once captured, they become more valuable than the cost reports themselves, as they become a persistent record of how the organization actually makes decisions. (Cough, context graph, though I'm honestly tired of hearing that term lately.)

The allocation layer is the real prize. If inference becomes a metered resource in a customer operations model, every dollar must prove its worth. Which vendors can tell you when a token has translated into a result, and when it hasn't, and why?

Enterprises won't figure this out entirely on their own. They will buy it as a transformation. Fortune 500 companies have played this script before: buckle up, hire McKinsey, recruit every former Palantir employee available, and have the CEO drive change top-down. Token-to-outcome attribution will arrive like ERP, BI, and digital transformation before it: as a high-profile "project" with executive sponsorship, a supporting infrastructure layer, and eventually becoming a new source of truth. Founders who can deliver this will build different types of founding teams, and they themselves will differ from the traditional startup archetype.

Whoever masters token-to-outcome attribution makes the allocation decisions: which workflows deserve more compute power, which should be capped, which should switch to cheaper models, which should stay with humans, and which can replace BPO. And once you can make those decisions, you control the flow of enterprise AI spending and have the trust required to allocate that resource.

The first phase of enterprise AI proved: models can do the work. The next phase will determine: how much of this work is actually worth paying for. As Charlie Munger said: Show me the incentives, and I will show you the outcome.

Original Link

Welcome to Join Odaily Official Community