Meta bán sức mạnh tính toán, Palantir chỉ trích, Zhipu trở thành tâm điểm tại Thung lũng Silicon, Câu chuyện về AI Capex sắp được kể theo một cách khác
- Quan điểm cốt lõi: Câu chuyện chi tiêu vốn (Capex) của ngành AI đang chuyển từ "thiếu hụt nguồn cung" về mặt tổng thể sang "sai lệch tỷ lệ sử dụng" về mặt cấu trúc. Các doanh nghiệp bắt đầu đánh giá lợi tức thực tế từ đầu tư sức mạnh tính toán, và việc Meta cân nhắc cho thuê sức mạnh tính toán dư thừa là một tín hiệu cho sự chuyển đổi này.
- Các yếu tố chính:
- Việc Meta có thể cho thuê sức mạnh tính toán AI gây lo ngại về tính bền vững của chi tiêu vốn, nhưng doanh thu của ba nhà cung cấp dịch vụ đám mây lớn (AWS, Google Cloud, Azure) vẫn tăng trưởng mạnh, và hướng dẫn chi phí không giảm mà còn tăng, cho thấy nhu cầu tổng thể chưa sụp đổ.
- Thị trường sức mạnh tính toán đang phân tầng: Các nhà cung cấp đám mây hàng đầu có thể tiếp tục tăng giá nhờ sự chắc chắn (ví dụ: dịch vụ khóa GPU của AWS tăng 20%), trong khi các công ty mô hình và tầng giữa phải đối mặt với sự kén chọn về tỷ lệ sử dụng sức mạnh tính toán, các doanh nghiệp bắt đầu phân tách mua sắm dựa trên giá trị nhiệm vụ.
- Việc áp dụng AI trong doanh nghiệp đang bước vào giai đoạn "tính toán chi phí". Khảo sát của UBS cho thấy khoảng 60% doanh nghiệp được hỏi đang cắt giảm chi tiêu token. CEO của Palantir chỉ trích mô hình tính phí token là "thuế tài sản", nhấn mạnh cần chuyển từ tiêu thụ sang thanh toán dựa trên kết quả.
- Các mô hình mã nguồn mở (như Zhipu GLM-5.2) đang trở thành công cụ để doanh nghiệp giảm chi tiêu AI. Trường hợp của Coinbase cho thấy bằng cách chuyển đổi mô hình mặc định và tối ưu hóa, chi tiêu AI giảm một nửa nhưng lượng token sử dụng vẫn tăng theo cấp số nhân.
- Sự sai lệch về sức mạnh tính toán trở thành vấn đề then chốt: Một mặt Meta tính bán sức mạnh tính toán, mặt khác lại khó mua đủ năng lực mô hình hàng đầu từ Google, làm nổi bật mâu thuẫn cấu trúc giữa tài sản và nhu cầu.
The AI market is experiencing another violent pullback, this time because Meta hinted it might sell off its surplus AI computing power.
If this news had come out three years ago, probably no one would have thought it strange. Cloud computing has always been a business of slicing up servers and selling them to others. Amazon, Microsoft, and Google have been doing it for years. New cloud providers like CoreWeave and Nebius also follow this path, turning Nvidia chips into financing collateral, and then using that financing to acquire even more chips.
But when it's Meta's turn, things feel different.
Meta hasn't traditionally understood computing power this way. It buys chips, builds data centers, secures electricity and land—all for its own models, its ad systems, its recommendation feeds, and Zuckerberg's ever-closer vision of superintelligence. It's not a cloud provider. It didn't originally make money by renting out its machines to others.
A company that once said, "I need as many machines as possible because the future will consume them," is now saying, "If we can't use all these machines right now, we can sell access to them."
This isn't a direct confirmation of an oversupply of compute, but it's not something to be dismissed lightly either.
On the day the stock market plunged, Palantir CEO Alex Karp spent nearly twenty minutes on CNBC, venting on camera.
He was initially there to discuss Palantir's new partnership with Nvidia, but quickly shifted his focus to the token-based pricing models of OpenAI and Anthropic. He claimed that CEOs are privately complaining to him that enterprise AI adoption right now means "paying for tokens that create no value, while also handing over your data." He even referred to the increasingly expensive AI model bills as a "wealth tax" imposed on enterprises.

For the past two years, the discussion has been about who dares to spend, who spends fastest, and who can pile up data centers first. Now the question is slowly changing. After the machines are bought, who can keep them running at full capacity?
Meta's proposal hasn't materialized into a formal business yet. According to public reports, it has an internal direction called Meta Compute, which could involve selling raw computing power or, similar to Amazon Bedrock, hosting different models on its infrastructure for developers. Zuckerberg previously mentioned at a shareholder meeting that external companies almost weekly ask if they can buy API services or some compute capacity from Meta, often willing to pay a price higher than Meta's cost.
He added a caveat then. They hadn't done it yet because Meta believed it could still use that computing power itself.
If they can use it, leasing is a choice. If they can't, leasing becomes a painkiller for the balance sheet.
This is where judgment becomes most difficult. Meta might simply be creating a window in its construction pace, selling temporarily idle resources. Alternatively, it might be signaling to investors that the hundreds-of-billions in AI spending cannot be sustained indefinitely by a distant superintelligence; a nearer-term revenue stream must be found first.
Both interpretations are plausible.
Demand hasn't disappeared, it's just becoming selective
Capex is the absolute core of the AI narrative. Like the liquidity injection of 2021, the expectation is that Capex will keep growing, the liquidity will keep flowing, and only then will all the branches of the market rise together. Seeing Meta preparing to sell computing power, many people's first reaction is that AI Capex is about to collapse. Big companies have finally admitted they bought too much; the semiconductor party is over.
That's too simplistic.
Public data doesn't yet support such a definitive conclusion. AWS revenue grew 28% in Q1, reaching $37.6 billion, a rare fast growth for recent years. Google Cloud grew even faster in Q1, reaching $20 billion in revenue. Microsoft Azure is still running at around 40% growth.
Amazon is still saying its capital expenditure could reach $200 billion this year. Alphabet raised its 2026 capital expenditure guidance to $180-$190 billion. Meta itself raised its full-year capital expenditure to $125-$145 billion.

These numbers don't look like a collapse in demand.
It looks more like a divergence.
The situation for cloud providers is different from that of model companies. Cloud providers sell the roads. As long as there's traffic on the road, regardless of who built the cars, they collect the tolls. OpenAI, Anthropic, enterprise clients, government clients, startups—they all ultimately need to land on some data center, some chip, some network, and some power contract.
So the three major clouds can continue to be resilient.
AWS even raised the price of one of its AI cloud services at the end of June. This service allows customers to reserve GPUs in advance, and AWS increased its price by about 20% starting in July. It had already raised it by about 15% in January. This is not an action typical of weakening demand.
When things are scarce, sellers raise prices.
But not all model companies will necessarily be so comfortable.
Model companies have more demanding assets. Compute power doesn't generate revenue just by sitting there. It needs to be continuously filled by smarter models, higher-frequency users, and more expensive enterprise workflows. Only when the model is good enough will users tolerate queues, limits, price hikes, and increasingly complex subscription tiers.
This is also why Anthropic is viewed by the market as a different kind of company. It's not because it's cheap, but because users entrust it with expensive tasks. Writing code, modifying systems, running long tasks, integrating with enterprise workflows—once these tasks enter a production environment, they consume far more tokens than casual conversation.
The problem for strong models is not having enough machines.
The problem for weak models is that no one cares if the machines are idle.
Both problems are about compute power, but they are not the same thing.
There's a similar scent along xAI's trajectory. Grok hasn't formed a clear enterprise mindshare like the strongest models, yet some of the computing power within the Musk ecosystem can flow to Anthropic. This is a colder, more rational move than any slogan. Machines don't care about founders; they only care about who can keep them fully utilized.
The relationship between Google and Meta also shows things aren't so simple. In June, news emerged that Google had restricted Meta's use of Gemini because the amount of computing power Meta wanted to buy exceeded what Google could provide, even impacting some internal AI projects at Meta. One company is considering selling compute power, while simultaneously struggling to buy enough top-tier model capabilities for certain tasks.
This isn't a traditional surplus.
This is a misallocation fueled by increasingly eye-watering bills.
Cloud providers can keep raising prices because they sell certainty. What clients want is guaranteed access to GPUs for a specific period, a stable data center, and a reliable infrastructure that won't crash in the middle of the night.
But the problem for enterprise clients doesn't end once they secure the compute power.
They still have to present that bill to the CFO. The CFO won't ask how many tokens you used. He'll ask how much money those tokens saved the company, how much extra revenue they generated, and how many mistakes they prevented.
For enterprises, tokens become a utility meter
This brings us back to Karp's interview at the beginning.
He described what many AI companies sell to enterprises as overselling. The day before the show, Palantir posted a nine-point statement on X about so-called AI sovereignty, specifically targeting the "tokenmaxxing" model. The meaning isn't complicated: treat token consumption as progress, treat burning money as usage, and treat the bill as productivity.
Karp put frontier labs like OpenAI and Anthropic on the spot. His point isn't that enterprises shouldn't use the best models. It's that enterprises shouldn't hand over all their data, processes, and business judgment, and then pay an increasingly large bill based on consumption.
Palantir wants to sell something different. Not a universal chat interface, not a single API, but an integration of data, approvals, permissions, operational rules, and AI into the same business system. What the client pays for isn't "how many times AI was used," but whether a specific production line, a certain risk control process, or a particular government task has been genuinely transformed.
The people who actually manage the money in enterprises are starting to wake up.
UBS recently surveyed enterprise IT executives, and one direction is clear. Many enterprises aren't stopping their use of AI; they're putting brakes on AI spending. About 60% of surveyed companies are curbing token expenses and adding usage guardrails, especially those past the trial phase and starting to integrate AI into daily workflows.

This is a fascinating reversal.
Once AI transitions from a toy to a tool, spending becomes harder, not easier. In the toy phase, bosses were willing to allocate budget because everyone feared missing out. In the tool phase, the CFO asks who it saved labor hours for, who it sold more products for, and who it reduced risk for.
On this balance sheet, tokens don't look like revenue.
They look more like a utility meter.
Of course, you could say a fast-spinning meter means the factory is running. But you could also say that if the meter spins too fast while output hasn't increased, it indicates a problem with the machine.
AI agents amplify this issue. A Codex research study by OpenAI and several universities includes some startling data. In the first half of 2026, active Codex users grew more than fivefold; output tokens for some internal OpenAI roles exploded, with the median monthly output tokens for legal roles being 13 times higher than November 2025, and over 50 times higher for research roles.
Another study puts it more starkly. Agentic coding tasks can consume up to 1000 times more tokens than standard code chat and code reasoning. Token consumption for the same task can vary by 30 times between different runs.
This is the core reason for today's compute shortage.
It's not that people are asking chatbots a few more questions.
It's that software is beginning to transform into a swarm of tiny workers that repeatedly read files, execute commands, modify code, fail, retry, fail again, and retry again. They don't take lunch breaks, but they consume tokens with every step.
When tokens become a utility meter, whoever owns the power plant holds the power. But whoever wastes the electricity will be the first to face scrutiny.
As bills get thicker, cheaper models find their place
Once the CFO starts monitoring this meter, the next step is almost instinctive.
He will ask: which tasks absolutely require the most powerful model, and which tasks just need a model that's good enough?
At this point, open-source models like GLM, Kimi, DeepSeek, and Qwen stop being just tech news. They become powerful bargaining chips on the enterprise procurement table.
Even Marc Andreessen of top-tier Silicon Valley VC a16z stated that many AI practitioners now consider Zhipu AI's GLM-5.2 as one of the first Chinese models capable of matching or even surpassing top US open models in most tasks. This assessment may not be the final word, but it gives enterprises an additional option.
Coinbase provides an even more concrete example. Brian Armstrong said the company switched its default AI model to open-source models like GLM 5.2 and Kimi 2.7. Combined with model routing, caching, and streamlined context, token usage continues to grow exponentially, but AI spending has been cut by nearly half.
The power of this statement lies in the fact that enterprises can now disaggregate model procurement for the first time.
The hardest tasks continue to be assigned to the most expensive models. Common tasks like summarization, customer service, information extraction, templated code, and internal knowledge base Q&A are handled by cheaper models or local deployments.
Open-source models don't necessarily need to win the entire battlefield.
They just need to convince the procurement department that not every kilowatt-hour needs to be paid for at a luxury mansion's electricity rate.
Seen in this light, Meta selling computing power is no longer an isolated story.
It's part of the same narrative as Palantir criticizing tokens and Coinbase embracing open-source models: the AI spending chain is being dissected. The upstream sells certainty, the midstream sells results, and the downstream pressures unit prices. Every layer is still growing, but every layer is also being asked: is this money well spent?
The hardest part isn't buying machines, it's keeping them busy
For the past two years, the easiest story to tell in the AI industry has been one of scarcity.
Not enough GPUs, not enough electricity, not enough data centers, not enough engineers, not enough cloud capacity to run the models. This story was too smooth. When there's not enough of something, everyone instinctively rushes forward. Stake your claim first, sign the power contract first, buy the chips first, get the machines set up first.
In a resource grab, people don't tend to do the detailed math.
Because the cost of being slow seems greater.
But Meta's news pushes another problem to the forefront. After buying the machines, they don't automatically become a good business just because they were expensive. They need daily work. They need customers willing to pay. They need models to keep them busy. They need applications to convert costs into revenue.
This is utilization.
The term "utilization" sounds cold, but it's actually quite brutal. It doesn't ask if you have a future. It asks if your machine is running today. It doesn't care what you say at a press conference or whether you bought the most expensive GPU. It only cares about one thing: has this money turned into a continuous stream of cash flow?
Cloud providers have a relatively easier time answering this question. They have always sold infrastructure. AWS, Google Cloud, and Azure sell the roads, electricity, and server rooms. Whether clients are training models, running inference, or hosting applications, they ultimately need to land on some cloud platform.
So they can remain resilient.
Strong model companies also have their own answer. If the model is powerful enough, users will queue up, enterprises will integrate it, and developers will adapt their workflows around it. For them, compute power isn't inventory; it's a bottleneck. More machines mean they can operate more freely.
The hardest position is the middle layer.
They have the machines, the narrative, the model teams, and large budgets. But their model isn't in the lead, their product hasn't become a daily habit, and developers aren't keen on adapting their workflows for it. For companies like this, computing power transforms from a weapon into inventory overnight—with just one failed model launch or one wave of user migration.
Inventory isn't necessarily useless.
But inventory must be discounted, rented out, or find new purposes.
This is what makes Meta's compute sale so striking. It doesn't prove Meta's failure, nor does it prove the disappearance of AI demand. It simply allows the market to see, for the first time, that AI infrastructure can also face the same problems as a regular factory.
The factory is built. Where are the orders?
Compute power hasn't disappeared; it's starting to tier
So the best way to understand this isn't "compute glut."
That term is too crude.
A more accurate description is that compute power is starting to stratify.
The top tier remains tight. The best models, the best clouds, the most stable GPU clusters are still in high demand. AWS can raise prices because certainty itself has a price. Clients aren't just buying GPUs; they're buying the guaranteed availability of a specific set of machines on a specific day and hour.
The middle tier is becoming awkward. It might not be bad, but it's not scarce enough. It can run models, do inference, and be sold externally. But customers will compare, negotiate, and question why they shouldn't use a cheaper model, someone else's cloud, or why this batch of machines commands this specific price.
The bottom tier will be gradually squeezed by open-source models and cost optimization. Enterprises won't always call the most expensive model for routine tasks. They will implement routing, caching, context compression, and slot models into different price points.
Demand has grown up.
A child spends without looking at the bill; an adult does. As AI enters the enterprise, it goes through this process too. In the pilot phase, everyone fears missing out. In the scaling phase, everyone starts doing the math.
Once the math is done, the industry chain will no longer move in lockstep like it did in the early days.
Some will continue to raise prices because they sell irreplaceable certainty. Some will switch to selling results because clients don't want to pay for consumption itself. Some will be forced to lower prices because acceptable alternatives have appeared. Some will rent out their machines because idle machines look worse on the books than machines rented at a low price.
When these things happen simultaneously, the industry looks contradictory.
On one hand, compute is scarce.
On the other hand, compute is being rented out.
On one hand, token consumption is skyrocketing.
On the other hand, enterprises are cutting AI spending.
On one hand, top-tier models are getting stronger.
On the other hand


