Token 예산 전쟁: 기업 AI가 '회계 시대'로 진입하다

区块律动BlockBeats

特邀专栏作者

2026-05-28 12:00

이 기사는 약 4591자로, 전체를 읽는 데 약 7분이 소요됩니다

AI 비용, ROI 및 기업 내부 자원 배분

AI 요약

펼치기

핵심 의견: 기업 AI는 '도입 여부'에서 '회계 처리 방법'으로 전환 중이며, 핵심 갈등은 Token 비용과 비즈니스 가치를 직접적으로 연계하기 어렵다는 점입니다. 다음 단계의 핵심은 모델 성능이 아니라, Token 소비를 특정 비즈니스 결과에 정확히 귀속시켜 AI 자원 배분을 결정할 수 있는지 여부입니다.
핵심 요소:
1. AI 추론 비용은 실험 예산에서 지속적인 운영 지출로 전환되었으며, CEO와 CFO는 모든 Token 1달러가 가져오는 실제 가치를 정량화할 것을 요구합니다.
2. Token 소비는 가치와 동일하지 않습니다. 동일한 워크플로우라도 프롬프트, 컨텍스트 길이, 모델 선택 및 재시도 횟수 등의 요인에 따라 비용이 5~10배까지 차이가 날 수 있습니다.
3. 한계 Token 효용은 핵심 지표로, 추론 비용 1달러를 추가로 지출할 때 창출되는 비즈니스 가치를 의미하지만, 대부분의 기업은 현재 이를 추적할 수 없습니다.
4. AI 예산 요청은 본질적으로 인건비와 경쟁하며, 아웃소싱(BPO)을 대체하는 것이 내부 직원을 대체하는 것보다 정량적 기준을 수립하기 쉽습니다.
5. 재시도 긴 꼬리 현상, 컨텍스트 팽창 및 부적절한 라우팅은 Token 비용 통제 불능을 초래하는 세 가지 주요 원인으로, 경제적 계산을 크게 변화시킬 수 있습니다.
6. Token에서 결과로의 귀속이 부재하기 때문에, 기업은 특정 워크플로우가 '왜' 성공 또는 실패했는지 설명하기 위해 에이전트의 의사 결정 궤적을 포착해야 합니다.
7. 귀속 능력을 갖춘 기업은 할당 결정(예: 워크플로우 최적화, 모델 전환)을 내리고, 궁극적으로 기업 내부 AI 자원의 흐름을 통제할 수 있습니다.

Original Title: Token Budget Wars

Original Author: Jaya Gupta

Original Translation: Peggy

Editor's Note: Enterprise AI is moving from the stage of "whether to adopt" to "how to calculate the costs."

Over the past two years, many companies pushed employees to use AI, more to keep up with technological trends and competitive pressure. But when AI inference costs shift from experimental budgets to ongoing operational expenses, CEOs and CFOs start asking a more realistic question: How much value has AI actually created? What tangible results did each dollar of token cost yield?

This is the core of the "Token Budget Wars." The so-called token budget war isn't just about companies wanting to lower their AI bills; it's about reassessing which business areas warrant more computational power, which tasks should be switched to cheaper models, which processes can replace outsourced or manual labor, and which are merely wasteful consumption.

The most noteworthy point of the article is that AI usage does not equate to value. In the SaaS era, usage typically meant the software was being adopted. But in the AI era, token consumption only indicates that "the meter is running." The same workflow, due to differences in prompts, context, model selection, and retry attempts, can result in cost variations by multiples. A higher bill could mean AI is genuinely working, or it could mean the system is engaging in ineffective flailing.

Therefore, the next phase for enterprise AI isn't just about model capability, but about aligning token costs with business outcomes. The first phase proved AI can do the work; the second phase must answer: is this work worth paying for?

Below is the original text:

Enterprise AI has moved from "whether to adopt" to "how to allocate."

In the C-suite, the new "currency" is your ability to quantify the ROI of AI investments. Every functional department is asked the same question: What did you produce, and what did it cost? Over the past two years, CEOs, waking up to watch Jim Cramer on CNBC (#bearish) while seeing competitors announce productivity gains, have mandated AI usage across their organizations. The real pressure now comes from the follow-up question: Show me the value.

Claude was released in November 2025, by which time most companies' annual budgets for 2026 were already locked in. By Q1, actual enterprise usage had far exceeded original plans. Inference costs were no longer just a budget line item for experimentation; they had become an ongoing operational expense. Consequently, a new question arose: Where exactly is AI creating value?

This question is difficult to answer because the utility of tokens has not been quantified. The bill doesn't tell you whether the expenditure replaced human labor, generated revenue, reduced risk, accelerated processes, or if it was just a group of engineers frantically burning tokens for the leaderboard (#metamates). When spending is a few hundred thousand dollars, it still looks like an experiment. But beyond a certain threshold, say reaching seven figures, it becomes infrastructure. Technical differences begin to have a material impact on the P&L: the same workflow with the same inputs can vary 5 to 10 times in token cost between two runs, with no apparent surface-level issue. At an experimental scale, this volatility is already expensive; but once it reaches infrastructure scale, it becomes a number the CFO must explain to the CEO.

You could call it "marginal token utility": the business value created by each additional dollar spent on inference costs. This is the truly important metric in the scaling phase, and it's the one most companies are currently blind to.

The question in the boardroom is shifting from "Is AI useful?" to "Where exactly does AI create leverage?" And that's precisely why the token budget battle is fundamentally a battle over the allocation rights of tokens.

The battle over token ownership is heating up rapidly because it directly clashes with a three-decade-old executive instinct: big teams mean big positions, big scopes of responsibility, and greater power. In the past, a visible marker of a senior manager's success was the size of the team they managed—direct reports, indirect reports, the headcount within their organizational chart.

But when intelligence becomes the scarce resource, the new marker becomes: How much intelligence can you command?

AI expenditure is essentially competing with labor costs.

Most requests for AI budgets are fundamentally one of three propositions: replacing outsourced labor, replacing internal labor, or creating new revenue.

An employee has a salary. An outsourced BPO contract has a price per ticket, claim, invoice, or audit. Humans understand these units of measurement. But inference costs are more complex because the final cost of completing a task depends on how the system executes the process. A claim processing task that requires three retries, manual corrections, and uses a frontier model might end up being more expensive than the outsourced labor it was intended to replace. This is why the discussion is shifting towards: What is the cost of achieving a result? For example, the cost per resolved ticket, per processed claim, per reviewed contract, per completed invoice, per avoided hire, per retained customer, or per dollar of revenue generated.

Executives have realized that BPO is the easiest place to establish a baseline because this work is already priced per "unit of completion." Comparing AI to internal employees is much harder because employees do many things daily, including scrolling through TikTok during lunch; productivity gains often manifest as avoided hiring or diffuse capacity release; and managers resist reducing team sizes based solely on partial automation. BPO provides a quantifiable baseline for business teams.

This logic differs from SaaS. SaaS trained companies to see usage as a proxy for value.

But AI breaks this. The amount of inference resources consumed by the same workflow can vary drastically based on prompts, retrieved context, chosen model, tools called, number of retries, and whether the agent gets stuck. The unit on the bill—the token—is stable, but the amount of work it represents is not.

More precisely: Signal and noise use the same unit of measurement. A rising token bill might mean real work is being done; but it might also mean computational power is being wasted on poor prompts, irrelevant context, unnecessary tool calls, repeated inference, and overkill models. Two companies could have identical token bills, but the underlying operations could be vastly different: one is translating inference into results, the other is paying for ineffective flailing, and both look identical on the bill line items.

SaaS usage tells you: software has been adopted. AI usage only tells you: the meter is running. It doesn't tell you whether the company is actually moving forward.

Why is marginal token utility so hard to see?

There are three main reasons.

First is the retry tail. If the probability an agent completes a workflow correctly on the first try is p, then the expected token consumption per resolved workflow roughly scales by T/p, where T is the base cost. If the completion rate drops from 90% to 70%, the effective cost per solved problem increases by about 28%, not 20%, because failures have a compounding effect. In enterprise workflows, inputs are often messy, and edge cases matter. Failures don't just reduce accuracy; they change the economics.

Second is context bloat. For operations heavily dependent on attention mechanisms, inference costs roughly scale as O(n²) with context length. So, doubling the context length roughly quadruples the inference cost. Everyone wants the model to have enough information, so systems tend to over-supply: retrieving fifty documents when five would do; connectors dumping entire email threads; agents carrying stale conversation histories.

Third is routing. When teams don't know which model is "good enough," they default to using the most powerful one. A basic classification task might end up running on the same model intended for complex reasoning. When call volumes reach millions, routing simple tasks to a smaller model versus routing everything to a frontier model often makes the difference between a manageable bill and a board-level problem.

Non-software industries will feel this pain as a "transformation." Software companies will see this problem first because the workflows being optimized are already highly instrumented. Engineering teams have metrics like PRs, commits, deployments, incidents, cycle time, and MTTR, and these are linked to the product. While not perfect, this type of work is easier to measure.

Non-software enterprises will feel this problem more acutely because their work is operational: claims, underwriting, customer service tickets, compliance reviews, supply chain exceptions, payment disputes. Or, companies with real-world assets face the same issue. These workflows have traditionally been measured by headcount, cycle time, SLA achievement rates, and error rates, and often have higher requirements for audit defensibility, not just average correctness. The units of work and units of cost don't speak the same language and don't reside in the same organization. The tech team sees token consumption, the business unit sees workflow changes, but connecting the two requires multiple teams to first agree on "what exactly is being measured."

I believe software companies will experience the token budget war as a productivity measurement issue, corresponding to the many "AI layoffs" we've seen; non-software enterprises will experience it as a transformation issue.

The missing layer is attribution from tokens to outcomes. Enterprises need a translation layer that connects inference spending with the work done and the resulting business outcomes. This layer must answer three questions: What is the true cost of this workflow, including retries and corrections? Which parts of the agent's execution trace were truly important, and which were just ineffective flailing? Has this work changed the operating model—e.g., fewer tickets per agent, shorter claims cycles, smaller BPO budget, delayed hiring? The next layer is outcome attribution in business language. Not simply saying "this workflow cost $2.13," but rather: this type of claim is cheaper handled by an agent than BPO, but if the policy requires additional exception documentation, the retry tail destroys the economics.

Measurement becomes memory. To connect a token to an outcome, the enterprise must capture everything that happens in between: what the agent saw, what it retrieved, which tools it called, what it ignored, where it retried, when it was overridden by a human, which exception rule applied, which precedent was used, and why one path succeeded while another failed. The measurement layer must record the decision trace—something enterprises have rarely truly possessed. Record systems capture *what* happened, but rarely *why*. A CRM can tell you a deal was delayed, but not the unwritten judgments behind the sales forecast.

Decision rationale is one of the most perishable and easily corrupted assets in a company because it lives in Slack threads, email chains, escalation meetings, and people's heads. But the problem is, people leave, and processes change.

AI changes this because agents generate traces. Every retrieval, tool call, retry, escalation, manual correction, and final decision becomes part of a path from context to action to outcome. Initially, companies will capture these traces to justify spending. But once captured, these traces become more valuable than the cost reports themselves because they become a persistent record of how the organization actually makes decisions. (Ahem, context graph, even though I'm really tired of hearing that term lately.)

The allocation layer is the real prize. If inference becomes a metered resource in the customer operations model, every dollar must justify itself. Which vendors can articulate when tokens converted into results, when they didn't, and why?

Enterprises won't figure this out entirely on their own. They will buy it as a transformation. Fortune 500 companies have played this script before: buckle up, hire McKinsey, bring in every former Palantir employee on the market, and have the CEO drive change top-down. Token-to-outcome attribution will arrive like ERP, BI, and digital transformation: as a "project" with executive sponsorship, supported by underlying infrastructure, ultimately becoming a new source of truth. Founders who can accomplish this will build different kinds of founding teams, different from the traditional entrepreneurial archetype.

Whoever masters token-to-outcome attribution can make allocation decisions: which workflows deserve more compute, which should be limited, which should switch to cheaper models, which should stay with humans, and which can replace BPO. And once you can make these decisions, you control the flow of enterprise AI spending and gain the trust needed to allocate this resource.

The first phase of enterprise AI proved: models can do the work. The next phase will determine: how much of this work is actually worth paying for. As Charlie Munger said: Show me the incentives and I'll show you the outcome.

Original Link

Odaily 공식 커뮤니티에 가입하세요