Token Budget Wars: Enterprise AI Enters the 'Cost Accounting Era'

区块律动BlockBeats

特邀专栏作者

2026-05-28 12:00

บทความนี้มีประมาณ 4591 คำ การอ่านทั้งหมดใช้เวลาประมาณ 7 นาที

AI Costs, ROI, and Internal Corporate Resource Allocation

สรุปโดย AI

ขยาย

Core Insight: Enterprise AI is shifting from "whether to adopt" to "how to calculate the costs." The central conflict is the difficulty of directly linking token costs to business value. The key in the next phase is not model capability, but the ability to accurately attribute token consumption to specific business outcomes to determine AI resource allocation.
Key Elements:
1. AI inference costs have shifted from experimental budgets to ongoing operational expenses. CEOs and CFOs demand quantification of the actual value generated by every dollar spent on tokens.
2. Token consumption does not equal value: For the same workflow, costs can vary by 5-10 times due to factors like prompt engineering, context length, model selection, and retry attempts.
3. Marginal token utility is the core metric, referring to the business value created by each additional dollar of inference cost. However, most companies cannot currently track this.
4. AI budget requests essentially compete with labor costs. Replacing Business Process Outsourcing (BPO) is easier to benchmark quantitatively than replacing internal employees.
5. Long-tail retries, context bloat, and improper routing are the three main causes of runaway token costs, significantly altering the economic equation.
6. Due to a lack of attribution from tokens to outcomes, companies need to capture an agent's decision-making trajectory to explain *why* a specific workflow succeeded or failed.
7. Companies that master this attribution capability can make allocation decisions (e.g., workflow optimization, model switching) and ultimately control the flow of internal AI resources.

Original Title: Token Budget Wars

Original Author: Jaya Gupta

Original Translation & Compilation: Peggy

Editor's Note: Enterprise AI is transitioning from "whether to adopt" to "how to track the costs."

Over the past two years, many companies pushed employees to use AI, primarily to keep up with technological trends and competitive pressures. But now that AI inference costs have shifted from experimental budgets to ongoing operational expenditures, CEOs and CFOs are asking a more realistic question: How much value did AI actually create? What real results did each dollar of token cost yield?

This is the core of the "Token Budget Wars." The so-called token budget war is not just about companies wanting to lower their AI bills, but about reassessing which business operations deserve more computing power, which tasks should be switched to cheaper models, which processes can replace outsourcing or manual labor, and which are simply wasteful consumption.

The most important point in the article is that AI usage does not equal value. In the SaaS era, usage typically meant the software was being adopted; but in the AI era, token consumption only indicates "the meter is running." The same workflow can have vastly different costs depending on the prompt, context, model selection, and number of retries. A higher bill could mean AI is genuinely doing work, or it could mean the system is spinning its wheels inefficiently.

Therefore, the next phase of enterprise AI isn't just about model capability, but about linking token costs with business outcomes. The first phase proved AI *can* do the work; the second phase will answer: is this work worth paying for?

Here is the original text:

Enterprise AI Has Moved from "Whether to Adopt" to "How to Allocate."

In the C-suite, the new "currency" is your ability to quantify the return on AI investment. Every functional department is being asked the same questions: What did you produce? What did it cost? Over the past two years, CEOs waking up to Jim Cramer on CNBC (#bearish) while watching competitors announce productivity gains, demanded everyone in the company use AI. The real pressure now comes from the follow-up question: show me the value.

Claude was released in November 2025, by which time most companies' 2026 annual budgets were already locked. By the first quarter, actual enterprise usage had far exceeded original plans. Inference costs were no longer just a budget line item for experiments but had become a recurring operational expense. This brought a new question: Where exactly is AI creating real value?

This question is hard to answer because the utility of tokens hasn't been quantified. The bill can't tell you if the spending replaced manual labor, generated revenue, reduced risk, accelerated processes, or was just a group of engineers burning tokens for a leaderboard (#metamates). When spending is a few hundred thousand USD, it still looks like an experiment. But beyond a certain threshold – say, reaching seven figures – it becomes infrastructure. Technical differences start materially impacting the P&L: the same workflow, with the same inputs, run twice, can have token costs differing by 5 to 10 times, with no apparent error. At an experimental scale, this volatility is expensive; but at an infrastructure scale, it becomes a number the CFO must explain to the CEO.

Think of it as "marginal token utility": the business value created by each additional dollar of inference cost. This is the truly important number in the scaling phase, and it's the number most companies can't see right now.

The question in the boardroom is shifting from "Is AI useful?" to "Where exactly does AI create leverage?" This is why the so-called "token budget war" is essentially a fight for the right to allocate tokens.

And this fight over token ownership is heating up because it clashes with a three-decade-old executive instinct: a big team means a big role, a big scope of responsibility, and bigger power. Previously, a visible sign of a senior manager's success was the size of their team – direct reports, indirect reports, total headcount in their org chart.

But when intelligence becomes the scarce resource, the new status symbol becomes: how much intelligence can you command?

AI spending is fundamentally competing with the cost of human labor.

Most AI budget requests essentially boil down to one of three claims: replacing outsourced labor, replacing internal labor, or creating new revenue.

An employee has a salary. An BPO outsourcing contract has a price per ticket, claim, invoice, or review. Humans understand these units of measure. But inference costs are more complex because the final cost of completing a task depends on how the system runs during execution. A claim process that requires three retries, manual correction, and uses a frontier model might end up being more expensive than the outsourced labor it was intended to replace. This is why the conversation is shifting to: What is the cost per *result*? e.g., cost per resolved ticket, per processed claim, per reviewed contract, per completed invoice, per avoided new hire, per retained customer, or per dollar of revenue generated.

Executives have realized BPO is the easiest place to establish a baseline because this work is already priced per "unit of completion." Comparing AI to internal employees is much harder because employees do many things in a day (including scrolling TikTok on their lunch break); productivity gains often manifest as avoided hiring or dispersed capacity release; and managers resist cutting teams based solely on partial automation. BPO provides business teams with a quantifiable baseline.

This is different from the logic of SaaS. SaaS trained enterprises to see usage as a proxy for value.

But AI breaks this. How much inference resource a single workflow consumes can vary dramatically based on the prompt, retrieved context, chosen model, tools called, number of retries, and whether the agent gets stuck. The unit on the bill – the token – is stable, but the amount of work it represents is not.

More precisely: signal and noise use the same unit of measurement. A rising token bill could mean real work is being done; but it could also mean computing power is being wasted on poor prompts, irrelevant context, unnecessary tool calls, redundant inference, and overpowered models. Two companies could have identical token bills but run fundamentally different operations underneath: one is translating inference into results, the other is paying for inefficient churn, yet both look the same on the invoice line items.

SaaS usage tells you: the software has been adopted. AI usage only tells you: the meter is running. It doesn't tell you if the company is actually moving forward.

Why is Marginal Token Utility Hard to See?

There are three main reasons.

First is the retry long tail. If the probability of an agent completing a workflow correctly on the first try is p, then the expected token consumption per resolved workflow scales roughly by T/p, where T is the base cost. If the completion rate drops from 90% to 70%, the effective cost per resolved issue increases by about 28%, not 20%, because failures have a compounding effect. In enterprise workflows, inputs are often messy, and edge cases matter. Failures don't just lower accuracy; they change the economics.

Second is context bloat. For operations heavily dependent on attention mechanisms, inference costs roughly scale as O(n²) with context length. Doubling the context length roughly quadruples the inference cost. Everyone wants the model to have enough information, so systems tend to over-supply: retrieving fifty documents when five would suffice; connectors dumping entire email threads; agents carrying stale conversation histories.

Third is routing. When teams don't know which model is "good enough," they default to the most powerful one. A basic classification task might run on the same model designed for complex reasoning. When call volumes reach millions, routing simple tasks to a smaller model versus feeding everything to a frontier model is often the difference between a manageable bill and a board-level problem.

Non-software industries will feel this pain as a "transformation." Software companies will see this problem first because the work being optimized is already heavily instrumented. Engineering teams have metrics like PRs, commits, deploys, incidents, cycle time, and MTTR, all connected to the product. Imperfect as it is, this work is easier to measure.

Non-software enterprises will feel this problem more acutely because their work is operational: claims, underwriting, customer service tickets, compliance reviews, supply chain exceptions, payment disputes. Or companies with real-world assets face the same issue. These workflows were historically measured by labor, cycle time, SLA achievement, and error rates, and often require greater rigor, needing to hold up to audits rather than just being correct on average. The unit of work and the unit of cost don't speak the same language and reside in different organizations. Tech teams see token consumption, business units see workflow changes, but connecting the two requires multiple teams to first agree on "what exactly are we measuring."

I believe software companies will experience the token budget war as a productivity measurement problem, mirroring the "AI layoffs" we've seen; non-software enterprises will experience it as a transformation problem.

The missing layer is the attribution from token to outcome. Enterprises need a translation layer to connect inference spending with the work completed and the business results generated. This layer must answer three questions: What is the real cost of this workflow, including retries and corrections? In the agent's execution trace, which parts were truly important, and which were just wasteful churn? Did this work change the operating model – e.g., fewer tickets per agent, shorter claims cycles, smaller BPO budgets, delayed hiring? The next step is outcome attribution in the language of business. It's not just saying "this workflow cost $2.13," but rather: "this type of claim is cheaper handled by an agent than BPO, but if the policy requires extra exception files, the retry long tail destroys the economics."

Measurement becomes memory. To connect a single token to an outcome, the enterprise must capture everything that happened in between: what the agent saw, what it retrieved, which tools it called, what it ignored, where it retried, where it was overridden by a human, which exception rule applied, which precedent mattered, and why one path succeeded while another failed. The measurement layer must record the decision trace, which is something enterprises have rarely truly possessed before. A system of record captures *what* happened, but rarely *why*. A CRM can tell you a deal was delayed, but not the unreported judgment behind the sales forecast.

Decision rationale is one of the most perishable and corruptible assets in a company, residing in Slack threads, email chains, escalation meetings, and people's brains. The problem is, people leave, and processes change.

AI changes this because agents generate traces. Every retrieval, tool call, retry, escalation, human correction, and final decision becomes part of the path from context to action to outcome. Initially, companies will capture these traces to justify spending. But once captured, these traces become more valuable than the cost reports themselves, serving as a persistent record of how the organization *actually makes decisions.* (Cough, context graph, though I'm really tired of hearing that term lately.)

The allocation layer is the real prize. If inference becomes a metered resource in a customer operations model, every dollar must prove its worth. Which vendors can articulate when a token translated into a result, when it didn't, and why?

Enterprises won't figure this out entirely on their own. They will buy it as a transformation. Fortune 500 companies have run this playbook before: fasten seatbelts, hire McKinsey, recruit every former Palantir employee on the market, and have the CEO drive change top-down. Token-to-outcome attribution will arrive like ERP, BI, and digital transformation did: as a "project" with executive sponsorship, backed by underlying infrastructure, eventually becoming the new source of truth. Founders who can achieve this will build different types of founding teams, and they themselves will differ from the traditional startup archetype.

Whoever masters token-to-outcome attribution can make allocation decisions: which workflows deserve more compute, which should be capped, which should switch to cheaper models, which stay with humans, which can replace BPO. And once you can make these decisions, you control the flow of AI spending within the enterprise and earn the trust needed to allocate this resource.

The first phase of enterprise AI proved: models *can* do the work. The next phase will determine: how much of that work is actually worth paying for. As Charlie Munger said: Show me the incentives, and I'll show you the outcome.

Original Link

ยินดีต้อนรับเข้าร่วมชุมชนทางการของ Odaily