Computing power is concentrating again: After DeepSeek's price cuts, who will control AI's infrastructure?

Gonka_ai

特邀专栏作者

@gonka_ai

2026-04-29 08:50

บทความนี้มีประมาณ 3504 คำ การอ่านทั้งหมดใช้เวลาประมาณ 6 นาที

Starting from Gonka’s talk at LA Hacks 2026.

สรุปโดย AI

ขยาย

Core Thesis: Significant price cuts by models like DeepSeek are democratizing AI applications, but paradoxically accelerating the concentration of computing power among a few cloud giants (the four major cloud providers are expected to spend $570.8 billion on capital expenditures in 2026). The decentralized computing network Gonka aims to integrate global idle GPUs through a Proof-of-Work incentive mechanism, offering a structural alternative before the centralization of the computing layer is complete.
Key Elements:
1. Model price cuts depend on abundant computing power, but global compute is converging on a few nodes; optical communications leader Lumentum's production capacity is nearly sold out through 2028.
2. The Bitcoin network's hash rate already exceeds the combined total of Google, Microsoft, and Amazon cloud data centers, but it is solely used for hash puzzles. Vast amounts of idle GPUs lack the coordination mechanism for AI inference.
3. Gonka redirects Proof-of-Work from hash computations to AI inference, ensuring nearly 100% of the network's computing contribution corresponds to real tasks, with the token's value anchored to the physical cost of compute.
4. In 2026, AI inference is expected to account for two-thirds of global compute consumption. Price cuts drive exponential growth in call volume, thereby strengthening the structural lock-in for players with massive compute resources.
5. Within less than a year of its mainnet launch, Gonka’s aggregated computing power has expanded from 60 H100s to over 10,000 H100 equivalents, driven by the spontaneous participation of hundreds of independent nodes worldwide.

On April 26th, DeepSeek released new pricing for its V4 series API: the cost for cache-hit inputs across all models was reduced to one-tenth of the initial launch price, and with a limited-time discount on the Pro version, the processing cost per million tokens dropped to as low as 0.025 RMB—nearly a hundred times cheaper than a year ago. China's A-share computing power sector hit the daily limit-up across the board, and market sentiment was boiling.

But behind the cheers, there is a question no one is directly addressing: As models become cheaper, the computing power required to run them is becoming increasingly concentrated.

The data doesn't lie. In the fourth quarter of 2025, the combined capital expenditures of Microsoft, Amazon, Meta, and Google increased by 64% year-over-year to $118.6 billion; it is projected that their total capital expenditures for the full year 2026 will further increase by 53% year-over-year, reaching $570.8 billion. During the same period, Google raised its target for TPU chip shipments in 2026 by 50%, to 6 million units. The delivery lead times for Nvidia's H100 series have stretched to several months in some markets.

Pricing power at the model layer is shifting towards developers, but control over the computing layer is consolidating into the hands of a few giants at an even faster pace. This is a hidden yet profound contradiction of the AI era.

Against this backdrop, on April 24, 2026, Gonka protocol co-founders Daniil and David Liberman took the main stage at LA Hacks 2026. UCLA's largest annual collegiate hackathon featured the Liberman brothers as its keynote speakers this year, addressing hundreds of top engineers about to enter the industry. The question they posed was strikingly clear at this moment: Is there still time for decentralized computing?

1. The Other Side of the Price Cuts

The logic behind DeepSeek V4's price cuts seems to be an efficiency dividend from technological progress—a new attention mechanism compresses the token dimension, combined with DSA sparse attention, significantly reducing demands on computation and memory. However, for price cuts to be sustainable, they depend on the premise that computing power somewhere is abundant and cheap enough.

The reality is that these sources of "abundant enough" computing power are rapidly converging on a few nodes globally. Michael Hurlston, CEO of optical communications leader Lumentum, recently stated that based on current trends, the company's production capacity is nearly fully sold out through 2028. This is not the struggle of an individual company but a collective tension across the entire AI infrastructure supply chain facing hyper-scale demand.

Daniil used a simple yet powerful analogy in his LA Hacks speech: The computing power of the Bitcoin network already exceeds the combined total of Google, Microsoft, and Amazon's cloud data centers—yet what is all this computing power doing? Solving a hash puzzle that no one needs an answer to. The same applies to idle GPU power globally: the GPUs in gamers' machines, servers in university labs, spare capacity at smaller cloud providers—aggregated, they represent a massive scale, yet a lack of coordination mechanisms prevents them from being used for AI inference.

What Gonka aims to solve is precisely this coordination problem—using the incentive mechanism of proof-of-work to organize idle GPUs scattered around the world into a network capable of handling real AI inference tasks.

2. Inference is the New Battleground

DeepSeek's price cuts have sparked widespread discussion about "AI democratization" across the Chinese internet. But one overlooked detail is that the price cuts are on "invocation costs," not "computing costs." As AI applications scale, the growth in inference invocations is exponential—according to industry forecasts, by 2026, inference will account for roughly two-thirds of global AI computing consumption.

What does this mean? Every time the invocation price drops by an order of magnitude, the actual total demand for computing power only increases, never decreases. The "democratization" of large models, to some extent, paradoxically accelerates the centralization of the computing layer—because only players with massive computing power can sustain the operation of inference services at ultra-low margins.

This is a structural lock-in taking shape: whoever controls the physical computing power on the inference side controls the true infrastructure gateway of the AI era. From this perspective, the significance of decentralized computing networks goes far beyond a cost optimization of "being 50% cheaper." It represents providing a structural alternative pathway before the centralization lock-in is complete.

3. A Real Question for Young Builders

The participants of LA Hacks—engineers and product people from California's top universities—will soon face a not-so-romantic engineering choice: which layer of computing power will you build your product on?

Whose servers is your AI product invoking inference on?

When that platform adjusts its pricing strategy or access policies, do you have the ability to migrate?

Is the user base you help build creating value for yourself, or fueling the leverage of the platform?

Developers experienced these questions once in the Web2 era: when an app's fate is deeply tied to a platform's algorithm or distribution rules, "independence" becomes a word that needs constant redefinition. The computing power dependency in the AI era will replicate the same logic, but at the infrastructure layer, and because switching costs are higher, the lock-in effect will only be stronger.

Hackathons, as a format, have an inherent irony: building something functional in 36 hours with minimal resources and maximum speed—this is precisely the state that decentralized network incentive mechanisms aspire to achieve. Daniil stepped onto the stage at LA Hacks not just to talk about Gonka, but more like he was asking this group: Are the things you will build in the future accelerating this trend of centralization, or creating new possibilities?

4. PoW 2.0: An Engineering Proposition

Gonka redirects the incentive structure of proof-of-work from hash computation towards AI inference, so that nearly 100% of the network's computing power contribution directly corresponds to real tasks. There is a key engineering requirement for this mechanism: AI inference tasks must be verifiable and reproducible—given the same model weights, random seed, and input, any node can replicate the computation result and verify its validity. This is the core engineering challenge in moving Gonka from an academic prototype to a functioning network.

From an economic perspective, the significance of this mechanism is that the token's value is naturally pegged to the physical cost of computing power, rather than liquidity-driven sentiment. Miners contributing computing power are rewarded, and developers invoking computing power pay fees. The incentive loop of the entire system does not rely on the goodwill of any intermediary to function.

Of course, technical feasibility is only one part. The harder question is: In an era of hyper-growth in computing demand and capital expenditures from major players measured in tens of billions of dollars, can a distributed computing network organized through spontaneous community contributions constitute meaningful competition in terms of scale?

Gonka's early data provides a reference point: less than a year after its mainnet launch, the network's aggregated computing power has expanded from an equivalent of 60 H100s to over 10,000 H100s, driven by the spontaneous integration of hundreds of independent nodes globally, not centralized orchestration. This doesn't prove the scale problem is solved, but it shows that the incentive mechanism is effectively driving early growth.

5. The Question of the Window of Opportunity

Historically, the dominance of infrastructure tends to converge rapidly in its early stages—this was true for the railway era, the internet era, and the mobile internet era. Each time, some found their place before standards solidified, while others realized participation rights had narrowed significantly only after centralization was complete.

Which stage is the AI computing infrastructure currently in? Looking at the four major cloud providers' projected $570.8 billion capital expenditure for 2026, centralization is accelerating. However, looking at the actual usage patterns of developers, the supply side still holds massive amounts of unintegrated resources. This gap is the structural space where decentralized networks can exist.

Daniil referenced an analogy in his speech: After the bursting of the internet bubble in 2000, what remained was not ruins, but a fiber optic network laid across the globe, which supported the digital economy for the next two decades. After the current wave of AI infrastructure investment subsides, the computing protocols and incentive mechanisms that solidify will become the infrastructure for the next cycle. The question remains: which protocols have a robust enough underlying logic to remain operational under pressure?

This is not a question about a specific project, but a question the entire decentralized AI track must confront: Can governance design truly resist the erosion of single points of control? Will the incentive mechanisms remain effective as scale increases? Is the decentralization of the computing network simultaneously valid across three dimensions: the technical execution layer, the token issuance layer, and the upgrade decision-making layer?

Conclusion

DeepSeek's price cuts have once again fueled the "AI democratization" narrative. But democratized inference invocation and democratized computing infrastructure are two different things. The former is happening; whether the latter can happen depends on how many people, in the coming years, truly treat it as an engineering problem worth solving, and not just a nice-sounding narrative.

About David and Daniil Liberman

David and Daniil Liberman are the co-founders of Gonka, a decentralized AI computing network. They previously served as Product Directors at Snap Inc. and founded the AI development company Product Science Inc., with years of deep expertise in the AI field.

Gonka is currently the decentralized AI network with the largest number of GPUs, providing developers and researchers with permissionless access to computing resources while rewarding all participants with its native token, GNK. The project successfully raised $18 million in 2023 and an additional $51 million in 2025. Investors include Coatue Management (an investor in OpenAI), Slow Ventures (an investor in Solana), Bitfury, K5, and partners of Insight and Benchmark. Early contributors to the project include well-known leading enterprises in the Web 2 to Web 3 space, such as 6 blocks, Hard Yaka, and Gcore.