AGI已經來了：紅杉年度大會最硬核的13場AI對話

区块律动BlockBeats

特邀专栏作者

2026-05-07 09:09

本文約23875字，閱讀全文需要約35分鐘

13位頂尖玩家告訴你，AGI已經降臨

AI總結

展開

核心觀點：2026年AI行業正從「模型能力競賽」轉向「真實世界接入」階段，紅杉資本提出「功能性AGI」已到來，智慧正從奢侈品變為廉價工業原料，競爭焦點轉向組織重構、人類意圖定義與物理世界融合。
關鍵要素：
1. 智慧成為大宗商品：類比「鋁的時代」，PhD級知識壁壘因AI大規模生產而崩塌，高級智力不再稀缺。
2. 人類注意力成新瓶頸：Greg Brockman指出當agent自主工作時，人類注意力是最稀缺資源；Karpathy強調「理解」是唯一限速瓶頸。
3. 組織架構是護城河：Anthropic的Boris Cherny認為，長期優勢不在於模型版本，而在於組織對AI的「原生性」程度，如agent自主協作。
4. AI進入物理世界：Waymo已實現2000萬次自動駕駛，安全性比人類高13倍；NVIDIA Jim Fan預測機器人透過規模化影片預訓練掌握物理直覺。
5. 安全進入AI軍備競賽：XBOW的AI駭客登頂全球排行榜，自主攻擊能力將在6-9個月內擴散，防禦窗口已關閉。
6. 算力競爭轉向底層重構：太空計算（Starcloud）、AI自主設計晶片（Recursive）、數據效率提升（Flapping Airplanes）及非馮·諾伊曼架構（Unconventional AI）成為新方向。

Introduction

At the end of April 2026, Sequoia Capital hosted the fourth AI Ascent conference in San Francisco. The event invited core AI industry companies such as OpenAI, DeepMind, Anthropic, NVIDIA, and Waymo, as well as startups betting on emerging directions like ElevenLabs, XBOW, Recursive Intelligence, and Starcloud. The 13 dialogues spanned foundational models, programming paradigms, robotics, autonomous driving, chip design, space computing, and novel computing architectures, essentially covering the most cutting-edge tracks of the current AI industry.

Compared to previous years, the tone of this year's AI Ascent was more direct: AI is no longer just a tool for improving efficiency; it is beginning to enter real workflows, taking over some complex tasks that could previously only be done by humans. In its opening remarks, Sequoia called this the arrival of "functional AGI"—not that machines are already equivalent to humans in all dimensions, but from a commercial and productivity perspective, long-horizon agents have crossed the threshold from demonstration to usability.

This is also the core background of the conference: when intelligence becomes cheap, callable, and scalable, the competitive focus of AI is shifting from "can the model do it" to "how to integrate it into the real world." Software, services, organizations, hardware, energy, security, and physical space may all be redesigned as a result.

The story Sequoia tries to tell is clear: intelligence is no longer a luxury but is becoming a new industrial raw material. What truly matters in the next phase may not be who has the smarter model, but who can understand customers faster, reorganize processes, dispatch agents, and transform this cheap intelligence into a sustainable commercial system.

Therefore, the discussions at this conference were not just about the next step in AI technology, but a bigger question: when machines can take on more and more mental labor, how should humans, companies, and society redefine their own value?

Several Key Themes Running Through the Conference

First, intelligence is becoming a commodity.

Sequoia likened this transformation to "aluminum" in the late 19th century: once more expensive than gold, it became a readily available industrial material within decades due to the popularization of the electrolysis process. Today, PhD-level expertise and the cognitive barriers that once defined middle-class competitiveness may be undergoing a similar fate. High-level intelligence is no longer naturally scarce but is beginning to be mass-produced, called upon, and distributed by models.

Second, the bottleneck is shifting from machines to humans.

Greg Brockman uttered a sentence repeatedly quoted at this conference: when agents can work autonomously, human attention will become the scarcest resource in the entire economy. Karpathy expressed the same judgment more bluntly: when machines can handle almost all execution details, the only ability humans cannot afford to lose is figuring out what they actually want. The problem is no longer whether machines can do it, but whether humans can propose the right goals, judge if the results are reliable, and decide what is worth completing.

Third, programming is being solved, organizations are not.

Internally at Anthropic, a large amount of code is already generated by models, with different agents even collaborating autonomously on Slack. Boris Cherny's judgment goes further: the true moat is no longer a specific model version, but the degree to which an organizational structure is "native" to AI. For existing companies, this is an unfriendly conclusion—because the gap doesn't just come from proficiency with tools, but from whether the company is willing to redesign processes, permissions, collaboration methods, and management structures around agents.

Fourth, AI is returning from the digital world to the physical world.

Jim Fan's robots, Waymo's 20 million autonomous rides, and ElevenLabs' emotional voice synthesis illustrate from different angles that AI is no longer just a screen-based tool for processing text, code, and images; it is beginning to understand and intervene in light, sound, force, motion, and space. Over the past decade, "software is eating the world" has been the main theme; next, AI may directly enter the physical world, changing cars, factories, robots, voice interaction, and physical manufacturing itself.

Fifth, the endgame of computing power lies in the physical fundamentals.

When land, electricity, and heat dissipation for ground-based data centers start hitting their limits, a group of more radical companies offers different solutions: Starcloud wants to send chips into space, Recursive has AI design chips autonomously, Unconventional AI tries to bypass the von Neumann architecture to mimic the brain, and Flapping Airplanes directly questions "brute force scaling" itself—if humans can learn the same skills with far less data, then today's AI algorithms might be fundamentally too inefficient. The endpoint of the computing power race is moving from buying more GPUs to a fundamental restructuring of energy, chips, architecture, and data efficiency.

Sixth, security has entered the asymmetric battlefield of "AI vs. AI."

XBOW's agent topped the global white-hat hacker rankings, meaning AI is no longer just an auxiliary tool for security researchers but an autonomous attack system capable of independently discovering, verifying, and exploiting vulnerabilities. More seriously, with the improvement of open-source model capabilities, such attack capabilities could rapidly proliferate within the next 6 to 9 months. Cybersecurity is no longer a contest between human hackers, but an AI arms race with a countdown already started.

Putting these clues together reveals that the AI industry in 2026 is in an uncomfortable position: technological capabilities have far outpaced product forms, organizational structures, and social rules. Models get stronger every day, but the "containers" meant to hold them—whether enterprise processes, application interfaces, or human attention itself—have yet to catch up.

The discussions throughout the conference essentially answered the same question: in a world where machines can perform an increasing amount of mental labor, what is left for humans?

Sequoia’s answer is somewhat counterintuitive: emotion, trust, and things that cannot be mass-produced. Brockman's answer is "what you want," Karpathy's is "whether you can judge if the machine is doing it right." These answers ultimately point to the same thing: when intelligence itself is no longer scarce, intent, judgment, and relationships will become the new hard currency.

Below is a summary of all 13 dialogues from this conference.

Dialogue Summaries

Keynote Speeches

Sequoia Partners' Opening Speech: This is AGI

The speakers, Pat Grady, Sonya Huang, and Konstantine Buhler, are the three core partners in Sequoia Capital's AI investment line. Sonya Huang is the author of the 2022 viral article "Generative AI: A Creative New World" and is considered one of the earliest institutional investors to systematically bet on generative AI. Together, they co-authored the 2026 article "This is AGI," which served as the ideological framework for this conference. Sequoia Capital itself is the top-tier venture capital firm in Silicon Valley with the longest history, having made early investments in Apple, Google, Nvidia, Stripe, OpenAI, and others.

AI is a "computational revolution" that completely overturns the nature of information processing, not just a "communication revolution" that accelerates distribution. The internet and mobile only changed the distribution path of information, while AI changes the underlying logic of information generation, causing the developer's floor (the technical foundation) for building applications to shift daily. The importance of this judgment lies in the fact that, in the "rainstorm moment" of an unstable foundation, the traditional stable technology stack is a thing of the past; developers must learn to dance with the ever-evolving model foundation.

AI will enter a $10 trillion market—ten times larger than traditional software—by directly delivering "professional services." The TAM (Total Addressable Market) for the global software market is only a few hundred billion dollars, while the US legal services market alone is a vertical worth $400 billion, equivalent in size to the entire software industry. This advocates a key transformation: the commercial value of AI is no longer as a tool sold to humans, but directly in the form of agents to take over and deliver high-value work originally done by human experts.

From a commercial practice perspective, a long-duration agent capable of autonomously handling failure signifies that AGI (Artificial General Intelligence) has arrived. If a system can be sent to execute a task, self-correct upon failure, and persist to the end, it is functionally equivalent to AGI. This counterintuitive judgment reminds us: stop obsessing over academic definitions. An AI with independent execution capabilities has evolved from a "faster horse" to a "car" that changes the dimension of competition, achieving a 10 to 40-fold leap in efficiency.

In a time of rapidly changing underlying capabilities, the only logic for building a moat is to be "extremely close to the customer." The MAD strategy—Moats, Affordance (referring to a product's intuitive ease of use), and Diffusion—advocates locking in value by starting from the customer backwards rather than pushing technology outwards. Since human needs change much slower than model capabilities, deep customer focus is more durable than chasing models.

Agent autonomy is making a leap from "minute-level assistants" to "hour-level autonomous employees." The meter chart measuring a model's ability to stay on track in complex tasks has jumped from minutes a year ago to several hours now, enough to support dark factories (fully autonomous business processes operating without human review). This means the productivity bottleneck has been broken, and extraordinary iterations like "rewriting 8 million lines of code in 6 weeks" are becoming the norm.

Human society is on the eve of a "cognitive industrial revolution," where machines will undertake 99.9% of the world's mental labor. Just as the industrial revolution replaced 99% of physical labor with engines, the vast majority of future analysis, decision-making, and creation will be handled by neural networks. The assertion of this judgment is that intelligence will no longer be a monopoly of humans, but a low-cost, industrial-grade consumable that can be infinitely mass-produced and called upon on demand.

Advanced intellectual skills are about to have their "aluminum moment," degrading from expensive luxuries into cheap commodities. Aluminum, once more precious than gold, became disposable due to the popularization of electrolysis. AI's instantaneous application of PhD-level knowledge will have the same effect. This portends a harsh future: professional knowledge barriers built over years could collapse instantly, and intelligence itself will no longer command a scarcity premium.

When intelligence becomes fully commoditized, human relationships and emotional connections will become the only real value anchors in human society. Photography once pushed art from realism towards Impressionism, which expresses the soul. Similarly, AI's optimization for efficiency often yields "alien spaces" that transcend human intuition. The final conclusion is counterintuitive yet profound: in a future where machines handle all work, only trust and emotion between humans will be the ultimate hard currency that cannot be mass-produced by machines.

If you could only remember one thing from this dialogue, what would it be?

The intelligence that was once valuable will soon become as cheap as a plastic bag. What will truly keep you competitive in the future is not a brain that can solve difficult problems, but the emotion to understand others and build trust.

Models and Cognition

Andrej Karpathy: From Vibe Coding to Agent Engineering (OpenAI Founding Team)

The speaker, Andrej Karpathy, is the most influential "educator-scientist" in the AI community. A founding team member of OpenAI, he later served as Tesla's AI Director, responsible for the autonomous driving vision system. In 2024, he left Tesla to found the AI education company Eureka Labs. His step-by-step tutorial videos on neural networks on YouTube have been the introductory textbook for countless AI engineers. He coined key concepts like "Software 2.0" and "Vibe Coding."

Even top experts feel "left behind" by the AI wave because the evolution of technology has leaped from assistive tools to autonomous systems. In early 2026, the speaker found he no longer needed to modify AI-generated code blocks; he just needed to trust the system to complete complex tasks. The importance of this judgment is that when AI can achieve self-correction and closed-loop delivery, the developers' baseline built on experience is violently raised, making personal learning speed hardly able to keep up with the shifting technological foundation.

Modern computing is entering the Software 3.0 era, where LLMs are essentially a new type of computer using context as leverage. Software 1.0 was writing code, 2.0 was training weights, and 3.0 is programming within the context (the memory space for the model to process information) via prompting. This means installing software no longer requires writing complex compatibility scripts; you simply "feed" a description to the agent. Precise spelling of details is no longer a core competency.

Many existing application architectures are becoming "redundant" because AI already possesses the ability to process directly at the raw data layer. The speaker found that his painstakingly developed menu generation app became meaningless, as models can now render overlays directly on photos at the pixel level. This advocates a profound change: AI should not just be used to accelerate old business logic; we must realize that the disappearance of the middle layer means many traditional product forms have lost their physical basis for existence.

AI capabilities are "jagged": it exhibits superhuman intelligence only in domains that can be verified. A model can refactor hundreds of thousands of lines of code but might fail on a simple common-sense test like counting the letter 'r' in "raspberry". This is because models are primarily strengthened via RL (Reinforcement Learning, a training method using reward signals to guide model evolution) in verifiable domains like math and code. This reminds us: we must constantly observe within the loop, vigilant against weaknesses outside the model's training distribution.

We are not building "animals" with intrinsic motivation, but "conjuring ghosts" within the data distribution. The peak intelligence of a model depends on the training data distribution (e.g., adding massive chess game data leads to a surge in chess skill), not on it generating some biological curiosity. This judgment counterintuitively points out that AI doesn't truly "understand"; it merely provides extreme reinforcement to specific circuits through statistical simulation. Therefore, users must learn to identify and avoid the false capabilities that lack data support.

Agentic engineering is about leveraging stochastic AI while maintaining the quality red lines of professional software. This new engineering method requires developers to coordinate unstable yet extremely powerful agents while ensuring the system produces no security vulnerabilities. It advocates a new 10x engineer paradigm: the core of competition is no longer the speed of writing code oneself, but the ability to efficiently drive a massive cluster of agents like a director to deliver high-quality results.

When machines take over trivial API details, the true premium for humans will shift towards aesthetics and control over the "specifications." Developers no longer need to memorize specific PyTorch (a deep learning framework) interface parameters, as these details will be handled by powerful, memory-driven AI "interns." This points to a counterintuitive future: foundational principles and design taste are more long-lasting than tool details. Humans should transition from "bricklayers" to decision-makers who define "what constitutes good design."

"Thinking" can be outsourced, but "understanding" is the only bottleneck for humans in the age of cheap intelligence. Although AI can assist us in processing and recompiling vast information, it cannot decide for us "why we should build this" or "whether it is valuable." This advocates a final conclusion: humans remain the only commander of the system because only human consciousness can give purpose to the intelligent processing power. This holistic understanding cannot be replaced by algorithms.

If you could only remember one thing from this dialogue, what would it be?

When machines can do all the work and even think through all the details for you, the only skill you cannot afford to lose is figuring out what you actually want and whether you can tell if the machine is doing it right.

Greg Brockman: Human Attention is the New Bottleneck (OpenAI Co-founder)

The speaker, Greg Brockman, is the co-founder and President of OpenAI. Former CTO of Stripe, he co-founded OpenAI with Sam Altman in 2015 and is the core architect of its technology and infrastructure. Inside OpenAI, Altman handles external affairs (fundraising, public image, policy), while Brockman manages internal aspects (technology, compute, products). His engineer-style of writing code himself and keeping watch during nightly releases is well-known in Silicon Valley.

Intelligence has become a standardized, resalable commodity, leading to insatiable pathological growth in demand for computing power. OpenAI's business model essentially involves buying or leasing compute, transforming it into intelligence via models, and selling it at a premium. Since the demand for problem-solving is infinite, the supply of GPUs (Graphics Processing Units) in 2026 is forecast to approach zero. The importance of this is that AI is no longer just a software service but has evolved into a resource-based commodity business where physical compute supply directly determines the ceiling of civilizational intelligence.

The scaling law (the empirical rule that model capabilities improve with more compute) is a universal empirical truth, with no "wall" in sight yet. Although the basic idea of neural networks dates back to the 1940s, as long as massive compute is continuously invested, model capabilities will correspondingly and deterministically strengthen. This supports the key view that technological stagnation will not happen soon; as long as capital and electricity are continuously invested, we will obtain more powerful intelligence, providing the underlying logical support for aggressive investments by tech giants.

From a functional perspective, we have already completed 80% of the journey to AGI (Artificial General Intelligence), as models now possess the closed-loop ability to execute tasks independently. A system engineer handed a complex optimization task to a model, which not only wrote the code but also ran a Profiler (performance analysis tool) autonomously and performed multiple rounds of optimization based on feedback until the task was fully completed. This advocates a counterintuitive view: AGI is not a future moment, but an ongoing process. AI has evolved from a "coding assistant" into a "colleague who can solve problems."

Context (the background information the model holds for a specific task) is replacing model algorithms as the most critical competitive frontier. A new tool, Chronicle, can record everything a user does on their computer in real-time

產業

紅杉資本

技術

歡迎加入Odaily官方社群