Jensen Huang’s “Agent Factory”: What New Stories Does It Hold?

星球君的朋友们

Odaily资深作者

2026-06-01 12:00

This article is about 6403 words, reading the full article takes about 10 minutes

When agents become AI infrastructure, every layer could have NVIDIA.

AI Summary

Expand

Core Thesis: At COMPUTEX 2026, NVIDIA announced a comprehensive technical ecosystem built around "Agent AI," spanning chips, models, and robotics platforms. The Vera Rubin platform enters mass production, optimized specifically for agent tasks, aiming to propel AI factories from the infrastructure stage into a new era of operation and deployment.
Key Elements:
1. The Vera Rubin platform is purpose-built for agents and is now in mass production. It delivers 10x the processing efficiency of the previous generation Grace Blackwell and introduces CPO networking technology and confidential computing at scale for the first time.
2. The company unveiled the Vera CPU, designed specifically for the AI era, which delivers 1.8x the agent workload performance of comparable x86 servers and is already in full production.
3. Launched the DSX factory operating system, including MaxLPS (power optimization) and DSX OS (operations management), with the goal of standardizing the construction and operation of AI data centers.
4. Released the 550-billion-parameter Mixture-of-Experts model Nemotron 3 Ultra and the open-source agent framework NemoClaw for building enterprise-grade "digital colleagues."
5. Introduced the third-generation physical AI model Cosmos 3, unifying visual reasoning, generation, and action prediction, aiming to compress physical AI training cycles from months to days.
6. Partnered with Unitree to unveil the humanoid robot reference design H2 Plus and open-sourced a physical AI "skills" toolset to lower the barrier to robot development.
7. The Vera BlueField-4 STX upgrades storage security, using chip-level policy enforcement to ensure secure interaction between agents and enterprise data.

Original authors: Li Hailun, Su Yang

Original editor: Xu Qingyang

Original source: Tencent Technology

On June 1, 2026, at the NVIDIA GTC Taipei conference held during COMPUTEX 2026, NVIDIA founder and CEO Jensen Huang delivered a keynote speech.

It had been only three months since the last GTC.

At that time, NVIDIA unveiled the Vera Rubin "full chip family," including: Vera CPU, Rubin GPU, Groq 3 LPU, ConnectX-9, BlueField-4 DPU, and Spectrum-6 switch. These six chips form a rack-scale AI supercomputer, and it was announced that the number of GPUs required for training large MoE models had been reduced to a quarter, inference throughput per watt had increased by 10 times, and the cost per single token had dropped to one-tenth.

Unlike the previous emphasis on system-level solutions like the "full chip family" and "full computing family," at COMPUTEX three months later, Jensen Huang turned his attention to the target these infrastructures will serve — Agents.

Jensen Huang revealed in his speech: Vera Rubin has officially entered mass production, Vera CPU has begun global deliveries, DGX Station has entered enterprise desktops for the first time in a Windows form factor, Cosmos 3 has restructured the perception framework for physical AI, and DSX has become the operating system for AI factories. NVIDIA also partnered with Unitree to launch the H2 Plus — the first humanoid robot reference design based on Isaac GR00T, extending the boundaries of Agents from the digital world to physical form.

NVIDIA is reorganizing its complete technology stack — from chips, data centers, models, and software to robotics platforms — around the Agent ecosystem.

Jensen Huang said: "The era of Agent AI and practical artificial intelligence has arrived. Now, tokens are the unit of profit, AI is the 'generator' of GDP, and the number of software engineers is increasing. People talk about AI reducing jobs, which is complete nonsense. In fact, more software engineers are being hired."

The Same AI Factory, Running 10x More Agent Tasks

The Vera Rubin platform has entered full production.

Unlike the past, which primarily focused on large model training and inference, Vera Rubin was designed from the outset with Agents as a key workload.

Jensen Huang stated in his speech that an Agent task often involves more than just a single model inference; it includes multiple stages like reasoning, searching, tool calling, code execution, and result verification, potentially involving thousands of steps. The future data center will no longer just process single model requests but will handle a large number of continuously running, interacting Agent tasks.

The platform is defined as a massive, unified computing unit-level AI supercomputer, purpose-built to handle agent workloads spanning reasoning, retrieval, and tool use. In a similarly scaled ultra-large data center, the new Vera Rubin platform can process tasks for autonomous AI agents with 10 times the efficiency of the previous generation Grace Blackwell platform.

Beyond the computing platform itself, networking has become a key area of upgrade for Vera Rubin.

In the past, data transfer between GPUs in data centers mainly relied on traditional optical modules and switch architectures. However, as cluster scales continue to expand, power consumption, heat dissipation, and deployment complexity increase rapidly. To address this, NVIDIA introduced the Spectrum-X Ethernet Photonics networking system in the Vera Rubin platform.

This marks the first large-scale introduction of Co-Packaged Optics (CPO) technology into an AI data center network by NVIDIA.

Simply put, traditional solutions require plugging optical modules externally into switches, whereas CPO integrates optical components directly inside the switch, thereby reducing energy consumption and signal loss.

Additionally, security is a core capability heavily emphasized with the Vera Rubin platform.

To this end, NVIDIA has extended Confidential Computing capabilities across the entire Vera Rubin platform. Through trusted execution environments, hardware-level attestation, and end-to-end encryption mechanisms, enterprises can achieve a higher level of security assurance when processing private data, industry-sensitive information, and critical models.

Jensen Huang revealed that Vera Rubin has entered the mass production phase. As a third-generation MGX rack-scale system, it involves over 150 partners, more than 350 factories, and a supply chain spanning over 30 countries and regions. According to NVIDIA's announced plan, Vera Rubin shipments will begin this fall.

A Processor "Born for Agents"

NVIDIA has launched a new processor, Vera, designed specifically for the agent era, and it has entered full production.

Jensen Huang pointed out that advancements in memory systems will drive innovation and modernization in storage systems. All CPUs until now were built for humans, while Vera is a CPU designed for the AI era, built for agents.

As the successor to Grace, Vera adopts NVIDIA's self-designed "Olympus" CPU core architecture, increasing the core count from 72 to 88 cores, and significantly boosting memory and data processing capabilities. According to NVIDIA, in Agent-related workload tests, Vera achieves task execution speeds 1.8 times that of comparable x86 server CPUs.

More important than the sheer performance increase is the change in the relationship between Vera and the Rubin GPU: Vera connects to the Rubin GPU via second-generation NVLink-C2C, achieving an interconnect bandwidth of 1.8 TB/s, further reducing the overhead of data transfer between CPU and GPU during Agent operation.

Jensen Huang stated that Vera Rubin uses HBM (High Bandwidth Memory) from Micron, SK Hynix, and Samsung, with a supply chain scale "twice" that of the previous generation Blackwell. However, while deploying a large Blackwell rack takes two hours, Vera Rubin's deployment time has been compressed to the level of 5 minutes.

Moving AI Factories from "Construction" to "Operation"

The DSX launched by NVIDIA this time can be understood as an "AI Factory Construction and Operation Toolbox."

In the past, building an AI data center required customers to separately consider servers, networking, power, cooling, facility design, and operational systems, with many aspects relying on coordination between different vendors. What DSX aims to do is consolidate these previously fragmented elements into a single framework, providing customers with a set of referenceable and verifiable standard solutions from design and simulation to construction and operation.

Jensen Huang stated at the conference: "NVIDIA is not just selling chips; we are providing infrastructure builders with a complete blueprint for an AI factory."

There are two main new capabilities added to DSX this time.

The first is DSX MaxLPS. It addresses the most practical problem for AI factories: how to fit more GPUs and generate more tokens within a fixed power budget.

According to NVIDIA, MaxLPS, by combining liquid cooling and in-rack power optimization, allows operators to run up to 40% more GPUs without significantly impacting performance.

The second is DSX OS. It functions as the operational software for the AI factory, responsible for lifecycle management, intelligent scheduling, health monitoring, fault recovery, and multi-tenant management. Simply put, if an AI factory is a complex facility, DSX OS keeps it running stably and continuously.

Within DSX's product matrix, Reference Design provides AI factory reference designs, guiding customers on how to set up facilities, racks, networking, power, and cooling systems. DSX Sim handles simulation, allowing customers to verify design feasibility before construction. DSX Flex connects the AI factory to the power grid, enabling the data center to adjust tasks based on electricity prices, load, and demand response signals. DSX Exchange manages the data interfaces between IT systems, operational systems, and energy/cooling systems.

On the ecosystem front, cloud partners like CoreWeave, Crusoe, and Lambda are deploying DSX Sim, MaxLPS, and DSX OS to reduce risk and improve GPU utilization. Manufacturers like Dell, HPE, Lenovo, Supermicro, as well as ASUS, Foxconn, Gigabyte, and QCT, are building systems supporting DSX.

Aligning with Windows and ARM

During the keynote, Jensen Huang officially unveiled the "DGX Station for Windows" workstation, defined by NVIDIA as a desktop-grade AI supercomputer for the Windows ecosystem.

Hardware-wise, it is powered by the GB300 Grace Blackwell Ultra Desktop Superchip, connecting the Blackwell Ultra GPU with a 72-core Grace CPU via NVLink-C2C, offering up to 748GB of unified memory and 20 PFLOPS FP4 performance, along with network capabilities up to 800Gb/s.

The key aspect of this product lies in the change in Agent deployment methods.

NVIDIA hopes enterprises can run multiple Agents locally, securely, and within a manageable Windows environment, integrating them into workflows for design, engineering, data science, reasoning, and Physical AI. The simultaneously launched OpenShell handles Agent runtime security, using isolated sandboxes and system-level policy controls to prevent Agents from unauthorized actions or leaking credentials and private data.

Beyond the enterprise desktop product, Jensen Huang also announced a system-level SoC — the RTX Spark SoC — integrating the N1X CPU and Blackwell GPU onto a single chip with a unified memory architecture, intended for thin-and-light laptops and small desktops.

The N1X is NVIDIA's first PC processor co-developed with Microsoft. It is based on the Arm architecture, custom-designed by MediaTek, and manufactured using TSMC's 3nm process. It will first be featured this fall in laptops from Microsoft, Dell, HP, ASUS, Lenovo, and MSI, with over 30 models in the initial lineup, targeting high-end thin-and-light laptops.

This is the "super chip" NVIDIA has prepared for the AI PC era. Jensen Huang sees it as a significant restructuring of the PC form factor.

The "Two Brains" of an Agent

At this conference, NVIDIA announced the latest progress on its two core model product lines, corresponding to two scenarios for Agents: one running within enterprise systems, and the other running in the physical world.

NVIDIA unveiled Nemotron 3 Ultra, a Mixture-of-Experts model with 550 billion parameters, designed to provide top-tier intelligence for long-horizon agents in code development, scientific research, and enterprise business processes. Compared to leading open-source frontier models of a similar scale, this model achieves up to 5x faster inference speed and up to 30% lower usage costs, enabling agents to perform various tasks more efficiently and cost-effectively.

Around the open Nemotron model, NVIDIA released a series of software, open-source models, and partnership updates. The goal is to enable enterprises to build "digital colleagues" that assist employees in scenarios like engineering design, healthcare, software development, and business operations.

Within this package, Nemotron provides the foundational model capability. NemoClaw is responsible for organizing the model into Agents. OpenShell handles runtime security, and Agent Toolkit turns NVIDIA software libraries like CUDA-X into tools that Agents can directly invoke. Agents can use tools, access data, execute tasks, and connect to existing enterprise systems within a controlled environment.

Jensen Huang stated that global software companies are bringing AI Agents into real-world work systems, helping employees complete complex tasks faster. NemoClaw provides the open components needed to build long-running Agents, including capabilities for orchestration, context, memory, tool calling, and security control.

In the past, enterprise discussions about AI focused more on what models could answer. Now, NVIDIA is addressing how Agents can safely access tools, data, and business processes, and operate continuously in real work environments.

Then there's Cosmos 3, the official third-generation release of the Cosmos series, which also represents a structural overhaul.

Cosmos 3 is a world foundation model for physical AI, providing the underlying capabilities to "understand the physical world, predict what will happen, and decide what to do."

Compared to earlier versions, which were primarily targeted at robotics and autonomous driving developers for video generation and physical world simulation (essentially a relatively single-modal generation framework), Cosmos 3 adopts a new architecture — a hybrid Transformer. For the first time, it unifies visual reasoning, world generation, and action prediction into a single system.

It natively understands and generates text, images, video, environmental sounds, and actions, achieving a leading level of physical accuracy. It is the world's first fully open omnimodal model. NVIDIA claims it has the potential to compress the physical AI training and evaluation cycle from months down to days.

Jensen Huang predicted that thanks to breakthroughs in multimodal reasoning, language, vision, and world models, the big bang of Physical AI is imminent.

The open frontier omnimodal models in the Cosmos 3 series offer developers a generational leap in capability, enabling them to build robots, autonomous vehicles, and vision AI that can perceive, reason, plan, and act in the physical world.

Lowering the Barrier to Physical AI

NVIDIA and Unitree jointly launched the H2 Plus — a humanoid robot reference platform for researchers and developers.

"Reference platform" means: Unitree provides the robot hardware, NVIDIA provides the software and computing platform. The hardware and software are pre-integrated, allowing development teams to start directly on skill development without spending time solving underlying integration issues. It is also the world's first open humanoid robot built on the NVIDIA Isaac GR00T development platform.

This reference platform targets a long-standing pain point in humanoid robot development: hardware integration, data collection, simulation, training, evaluation, and deployment are all siloed, making the entire process highly fragmented.

NVIDIA stated that research teams often spend a significant amount of time on basic integration when they get a robot, delaying actual skill development. The H2 Plus attempts to solve this by streamlining the path, allowing research teams to bypass underlying integration and jump directly into skill development and real-world validation.

In Jensen Huang's view, humanoid robots will bring physical AI to the world's largest industries, unlocking trillions of dollars in economic opportunities. The H2 Plus is the starting point for pushing cutting-edge research into real-world scenarios like factories, warehouses, and logistics systems.

Additionally, NVIDIA announced the official open-sourcing of a Physical AI Skills toolset, covering core scenarios like robotics, autonomous driving, vision AI, and industrial digital twins.

These "Skills" can be understood as standardized operational instructions for using NVIDIA's platforms like Cosmos, Omniverse, Isaac, and Metropolis, written so that agents can directly read and execute them. These instructions are packaged and open-sourced as the toolset released today.

When an agent receives a task, for example, generating a batch of training data for defect detection, it knows which model to call, what format to output, and how to verify results. The entire process runs automatically without requiring human intervention at each step.