Computing power is recentralizing: After DeepSeek's price cuts, who will control AI infrastructure?

Gonka_ai

特邀专栏作者

@gonka_ai

2026-04-29 08:50

This article is about 3504 words, reading the full article takes about 6 minutes

Starting from Gonka’s talk at LA Hacks 2026.

AI Summary

Expand

Core thesis: Massive price reductions by models like DeepSeek are democratizing AI applications, but paradoxically accelerating the concentration of computing power among a few cloud giants (the four major cloud providers’ capex is projected to reach $570.8 billion in 2026). Decentralized computing network Gonka attempts to integrate global idle GPUs through a PoW incentive mechanism, offering a structural alternative before the centralization of the compute layer is complete.
Key elements:
1. Model price cuts depend on abundant computing power, but global compute capacity is converging on a few nodes. Optical communications leader Lumentum's production capacity is nearly sold out through 2028.
2. The Bitcoin network’s hashrate already exceeds the combined total of Google, Microsoft, and Amazon’s cloud data centers, yet it is solely used for hash puzzles. Massive amounts of idle GPUs lack the coordination mechanisms necessary for AI inference.
3. Gonka redirects proof-of-work from hash computation to AI inference, enabling nearly 100% of network computing contributions to correspond to real tasks, with token value anchored to physical computing costs.
4. By 2026, AI inference is expected to account for two-thirds of global computing consumption. Price cuts will lead to exponential growth in usage, thereby strengthening the structural lock-in of players with vast compute resources.
5. Within less than a year of mainnet launch, Gonka’s aggregated computing power has expanded from 60 H100s to over 10,000 H100 equivalents, driven by the spontaneous integration of hundreds of independent nodes worldwide.

On April 26, DeepSeek released the new pricing for its V4 series APIs: the cost for cache-hit inputs across all tiers was reduced to one-tenth of the launch price, and with a limited-time discount on the Pro version, the cost per million tokens dropped to as low as 0.025 RMB — nearly a hundred times cheaper than a year ago. A-share computing power stocks surged to their daily limit, and market sentiment was electrified.

But behind the applause, a question remains undiscussed: As models become increasingly affordable, the computing power needed to run them is becoming increasingly concentrated.

The data doesn't lie. In the fourth quarter of 2025, the combined capital expenditures of Microsoft, Amazon, Meta, and Google reached $118.6 billion, a 64% year-over-year increase. It is projected that in 2026, their total annual CapEx will further rise by 53% to $570.8 billion. Google simultaneously raised its TPU chip shipment target for 2026 by 50% to 6 million units. The delivery lead time for NVIDIA's H100 series has extended to several months in some markets.

Pricing power at the model layer is tilting towards developers, but control over the computing power layer is consolidating towards a handful of giants at an even faster pace. This is a subtle yet profound contradiction in the AI era.

Against this backdrop, on April 24, 2026, Gonka protocol co-founders Daniil and David Liberman took the keynote stage at LA Hacks 2026. This annual premier collegiate hackathon at UCLA featured the Liberman brothers as keynote speakers, addressing hundreds of top engineers about to enter the industry. The question he posed became starkly clear: Is decentralized computing power still possible in time?

1. The Other Side of the Price Reduction Wave

The logic behind DeepSeek V4's price cuts appears to be efficiency gains from technological progress — a new attention mechanism compresses the token dimension, combined with DSA sparse attention, significantly reducing computation and memory requirements. However, continued price reductions depend on the premise that computing power somewhere is sufficiently abundant and cheap.

The reality is that these "sufficiently abundant" sources of computing power globally are rapidly converging towards a few nodes. Optical communications leader Lumentum's CEO, Michael Hurlston, recently stated that based on current trends, the company's production capacity is nearly fully booked through 2028. This is not an isolated company's struggle but a collective tension across the entire AI infrastructure supply chain facing explosive demand.

In his LA Hacks speech, Daniil used a simple yet powerful comparison: The computing power of the Bitcoin network has already surpassed the combined total of Google, Microsoft, and Amazon's cloud data centers — but what is all this power doing? Solving a hash puzzle that no one needs an answer to. The same applies to globally scattered GPU power: graphics cards in gamers' machines, servers in university computer labs, spare capacity from small and medium cloud service providers. The total scale is immense, yet due to a lack of coordination, it cannot be harnessed for AI inference.

This is precisely the coordination problem Gonka aims to solve: using a proof-of-work incentive mechanism to organize idle GPUs scattered across the globe into a network capable of handling real AI inference tasks.

2. Inference is the New Battleground

DeepSeek's price cuts triggered widespread discussion about "AI democratization" in the Chinese internet. But a crucial detail is often overlooked: the price cut is for "invocation cost," not "computing power cost." As AI applications scale, the growth in inference demand is exponential. According to industry forecasts, by 2026, inference will account for roughly two-thirds of global AI computing consumption.

What does this imply? For every order of magnitude reduction in invocation cost, the total computing power required actually increases, not decreases. The "democratization" of large models, paradoxically, accelerates the centralization of the computing layer — because only players with massive computing power can sustain inference service operations at ultra-low margins.

This is a structural lock-in taking shape: whoever controls the physical computing power for inference controls the true infrastructure gateway of the AI era. From this perspective, the significance of decentralized computing networks extends beyond "50% cheaper" cost optimization; it offers a structural alternative pathway before centralized lock-in solidifies.

3. A Real Test for Young Builders

The participants of LA Hacks — engineers and product managers from California's top universities — will soon face a non-romantic engineering choice: On which layer of computing power will you build your product?

Whose servers will your AI product use for inference?

If that platform adjusts its pricing strategy or access policies, do you have the ability to migrate?

Is the user scale you help build creating value for yourself, or providing leverage for the platform?

These questions were already experienced by developers in the Web2 era: when an application's fate is deeply intertwined with a platform's algorithms or distribution rules, "independence" becomes a word that needs constant redefinition. The computing power dependency in the AI era will replicate the same logic at the infrastructure layer, and because switching costs are higher, the lock-in effect will only be stronger.

Hackathons, as a format, have an inherent irony: building something functional with minimal resources in 36 hours — this precisely mirrors the state pursued by decentralized network incentive mechanisms. When Daniil took the stage at LA Hacks, it wasn't just to talk about Gonka; it was more like asking this group: Will the things you do in the future accelerate this trend towards centralization, or create new possibilities?

4. PoW 2.0: An Engineering Proposition

Gonka redirects the incentive structure of proof-of-work from hash computation towards AI inference, ensuring that nearly 100% of the network's computational contributions directly correspond to real tasks. This mechanism has a key engineering requirement: AI inference tasks must be verifiable and reproducible — given the same model weights, random seed, and input, any node can replicate the computation and verify its validity. This is the core engineering challenge for Gonka to transition from an academic prototype to a functional network.

Economically, the significance of this mechanism lies in the token's value being naturally pegged to the physical cost of computing power, rather than market sentiment. Miners contributing power are rewarded, developers using compute pay fees, and the entire incentive loop operates without reliance on any intermediary's goodwill.

Of course, technical feasibility is only one part. The harder question is: In an era where computing demand is skyrocketing and major players' CapEx is in the hundreds of billions of dollars, can a distributed computing network organized by community contributions truly compete on a meaningful scale?

Gonka's early data provides a reference point: less than a year after mainnet launch, the network's aggregate computing power expanded from the equivalent of 60 H100s to over 10,000. This growth came from the spontaneous participation of hundreds of independent nodes worldwide, not centralized allocation. This doesn't prove that the scaling problem is solved, but it demonstrates that the incentive mechanism is effectively driving early-stage growth.

5. The Question of the Window of Opportunity

Historically, infrastructure dominance tends to converge rapidly in its early stages — this was true for the railway era, the internet era, and the mobile internet era. Each time, some found their niche before standards solidified, while others realized only after centralization was complete that their participation rights had significantly narrowed.

What stage is AI computing infrastructure in? Given the projected $570.8 billion CapEx from the four major cloud providers in 2026, centralization is accelerating. However, looking at developers' actual usage patterns, the supply side still contains vast amounts of inefficiently integrated resources. This gap represents the structural space where decentralized networks can exist.

In his speech, Daniil referenced a comparison: After the dot-com bubble burst in 2000, what remained wasn't rubble, but a global network of fiber optics that underpinned the digital economy for the next two decades. After the AI infrastructure investment boom subsides, the underlying compute protocols and incentive mechanisms will become the infrastructure for the next cycle. The question is simply: which protocols have a sufficiently robust fundamental logic to remain operational under pressure?

This isn't about a specific project, but a question the entire decentralized AI track must confront: Can governance design truly resist the erosion of single-point control? Will incentive mechanisms remain effective as scale increases? Is the decentralization of the compute network valid across three dimensions: the technical execution layer, the token issuance layer, and the upgrade decision layer?

Conclusion

DeepSeek's price cuts have re-ignited the "AI democratization" narrative. However, democratized inference access and democratized computing infrastructure are two different things. The former is happening; whether the latter can happen depends on how many people, in the coming years, treat it as a genuine engineering problem worth solving, rather than just a compelling narrative.

About David and Daniil Liberman

David and Daniil Liberman are the co-founders of Gonka, a decentralized AI computing network. The duo previously served as Product Directors at Snap and founded Product Science Inc., an AI development company, with deep expertise in the AI field.

Gonka is currently the decentralized AI network with the largest number of GPUs, providing permissionless access to computing resources for developers and researchers, while rewarding all participants with its native token, GNK. The project successfully raised $18 million in 2023 and an additional $51 million in 2025. Investors include Coatue Management (an investor in OpenAI), Slow Ventures (an investor in Solana), Bitfury, K5, partners from Insight and Benchmark, among others. Early contributors include 6 Blocks, Hard Yaka, Gcore, and other leading Web 2-Web 3 enterprises.