Apple finally admits Siri has grown old

特邀专栏作者

2026-06-09 13:00

This article is about 7376 words, reading the full article takes about 11 minutes

At WWDC this year, it borrowed a model from Google, computing power from Nvidia, and another year of patience from users.

AI Summary

Expand

Core thesis: At WWDC 2026, Apple launched Apple Intelligence, which heavily relies on Google's Gemini and Nvidia's GPUs. This marks a strategic shift from fully in-house AI development to a "rebuilding by borrowing" approach. By leveraging hardware control and its privacy framework, Apple integrates external technologies into a system-level intelligent experience to counter intensifying AI competition, though localization challenges in the Chinese market persist.
Key elements:
1. Strategic pivot: Apple has entered a deep partnership with Google, paying approximately $1 billion annually to use the 1.2 trillion-parameter Gemini model. It also employs distillation techniques to train on-device models (with a minimum of 3 billion parameters), while core AI inference relies on Nvidia GPUs within Google Cloud.
2. Product rollout: Siri is upgraded to a standalone app (Siri AI) featuring memory and cross-device synchronization. iOS/iPadOS gains system-level AI functions such as notification summaries, email drafts, and camera recognition, requiring iPhone 15 Pro or later hardware.
3. Technology and control: Private Cloud Compute is extended to Google Cloud and Nvidia GPUs for the first time. Apple retains cryptographic control over PCC software, but acknowledges a concession of technological sovereignty, depending on an external "skeleton" (models and computing power).
4. China market dilemma: Due to regulatory filing requirements, Apple Intelligence requires localized adaptation and may face feature limitations. The absence of high-frequency scenarios like WeChat and Alipay, coupled with competition from domestic smartphone manufacturers' on-device AI, makes its prospects uncertain.
5. Historical context: Since launching Siri in 2011, Apple has long pursued AI development in a "closed-door" manner, accumulating on-device capabilities through acquisitions (e.g., Workflow) and chips (Neural Engine). However, the emergence of ChatGPT forced Apple to abandon its fully in-house R&D path and pivot towards collaboration to catch up.

Original Author: Sleepy

In the early hours of June 9, 2026, Beijing time, Apple's WWDC 2026 arrived as scheduled.

At the keynote, it renamed Siri to Siri AI, announced a deep partnership with Google to use Gemini's model capabilities to train its own next-generation foundation models, and extended Private Cloud Compute to Google Cloud and Nvidia GPUs for the first time.

It released five Apple Foundation Models, with the on-device model having a minimum of 3 billion parameters and the largest cloud model specifically optimized for Nvidia GPUs. Almost every everyday app was rewritten. Siri also got its own standalone app capable of saving conversations, syncing across devices, and possessing memory.

This was Apple's most information-dense keynote in years.

Taming a Future

Apple's AI story can be traced back to the autumn of 2011, the iPhone 4S launch event, when Siri first took the stage.

Steve Jobs was gravely ill at the time, and Apple stood at the crossroads of an era. Siri seemed like a little thing that had run out of a sci-fi movie. You asked about the weather, asked for restaurants, told it to set an alarm, and it would answer in a slightly mechanical tone. For the first time, you felt your phone was more than just a cold piece of glass.

Siri originated from SRI International's CALO project, originally a military-grade AI assistant funded by DARPA. Apple acquired it in 2010; according to TechCrunch, the deal might have exceeded $200 million. A year later, Siri debuted with the iPhone 4S. Apple claimed it could understand natural language and act like a personal assistant to get things done for you.

At that moment, Apple had secured the world's best entry point for personal intelligence. Then it wasted over a decade.

Looking back today, Siri's earliest impact was changing the way people talked to their machines. In 2011, the iPhone was transforming the phone from a communication tool into a personal computing device. The App Store redefined software distribution, and the mobile internet migrated from the PC desktop into the palm of your hand. Siri appeared right on the crest of a rising wave. But once inside Apple, it quickly morphed from an ambitious personal assistant into an obedient voice remote control.

At its core, Apple believes in closed, controlled systems. But a true personal assistant must connect to more services, understand more context, and tolerate more uncertainty. Uncertainty means errors, privacy risks, and the kind of disorder Apple is least adept at handling.

So, Siri was only allowed to perform deterministic tasks, like a tamed future. It had a name, a voice, a personality package, yet lacked the initiative and memory required for a genuine personality. Users were initially amazed, then began joking about it, and eventually stopped using it much at all.

Apple was the first to put a "personal assistant" into a phone, and also the first to lock it away.

The Agent trend the entire industry is now pursuing, in retrospect, was almost exactly the prototype of Siri in 2011. It's fair to say Apple was the first company to create the Agent prototype, only to end up being the last to finish it.

AI That Doesn't Feel Like AI

During the years Siri didn't grow up, did Apple's AI stagnate?

The answer is quite the opposite. Apple did a lot of AI, but it just didn't feel like AI at all.

If measured by keynote coverage, it looks like Apple only suddenly started talking seriously about AI in 2024. But if we trace the technology path backwards, Apple has been acting for a decade.

In 2015, it acquired two companies consecutively – one to bolster natural language conversation, another to explore running deep learning directly on phones. That same year, WWDC featured the Proactive Assistant, attempting to have the system offer suggestions before the user even spoke. This idea was ahead of its time, but under the then-technical conditions, it seemed more like a slogan.

The following year, Apple launched SiriKit, opening a limited crack for developers to access Siri, and publicly introduced Differential Privacy, stating its intent to learn from large-scale data while protecting individual privacy. In 2017, the iPhone X brought the Neural Engine; Face ID and the camera began relying on on-device machine learning. Apple simultaneously launched Core ML for developers to run models on Apple devices and acquired Workflow, which later became Shortcuts.

This is a very Apple-like set of answers. It wanted AI, but not like Google, which bet heavily on the cloud and massive amounts of personal data. It wanted developers, but didn't want Siri to become a chaotic mess. So Apple chose the most difficult and slowest path: focusing on on-device processing, privacy, and system integration.

Around 2020, Apple acquired several more companies focused on low-power edge AI and voice understanding. That same year, the M1 chip was released, bringing a 16-core Neural Engine to the Mac, pushing on-device AI computing power from the phone in your pocket all the way to your computer. The following year, Live Text and Visual Look Up debuted, allowing text in photos to be copied and the camera to identify plants and animals, with more voice requests processed locally on the device.

Over this decade-plus, Apple indeed didn't launch a standalone AI App, but it genuinely made the phone smarter.

Choosing this path made sense. AI on a phone isn't just a question-answering machine; it needs to look at photos, listen to voice, understand contacts, invoke apps, and sense battery, location, and time. It's best if it can do a little something even without internet, and preferably not package up every aspect of a user's life and upload it to the cloud. Apple's hardware control gives it the qualification to pursue this path.

Yet, there is a deep chasm between local intelligence and holistic intelligence. Apple excels at breaking technology down into reliable components, but generative AI requires it to assemble those components back into a cohesive whole.

These components lay quietly embedded in the system, waiting for an opportune moment.

But the moment didn't arrive first. ChatGPT did.

When ChatGPT emerged in late 2022, Apple wasn't entirely unprepared. Tim Cook repeatedly emphasized on various occasions that AI and machine learning had been core technologies in Apple products for years. Bloomberg reported in 2023 that Apple had an internal Ajax large model framework and an internal chatbot project.

However, the issue wasn't whether Apple had cards in hand; the problem was that the rules of the game had changed.

ChatGPT shifted user attention from "features" to "capabilities." Users began to expect an AI on their phones by default, then compare who was stronger. When ChatGPT could already organize a jumble of thoughts into a coherent email, Siri was still saying, "Here's what I found on the web."

At WWDC 2024, Apple brought Apple Intelligence to the forefront: writing tools, notification summaries, photo search, personalized Siri understanding, and ChatGPT integration. Apple finally admitted that relying solely on self-developed models, at least in 2024, couldn't meet user expectations. But the promises it made ultimately failed to materialize on the announced schedule.

Hiring Google as a Tutor

Behind the delay of Apple Intelligence wasn't just technology falling short, but the entire Siri team's structure struggling to keep pace with this wave of AI.

Multiple media outlets confirmed that Apple's former head of AI, John Giannandrea, stepped back; Craig Federighi took over AI direction; Vision Pro leader Mike Rockwell was brought in to lead the Siri team; and a large number of Siri engineers were sent to learn AI programming tools. This wasn't a dignified rotation; Apple internally realized that with the old team and the old pace, it couldn't catch up.

In January 2026, Apple and Google issued a joint statement announcing that Apple would leverage Gemini technology to customize Apple Intelligence features for the iPhone and other products. Reports indicated Apple planned to pay Google approximately $1 billion annually to use a customized 1.2 trillion-parameter Gemini model to support the Siri overhaul. Apple had also tested models from OpenAI and Anthropic, but ultimately chose Google.

This was completely different from the ChatGPT integration in 2024. In that case, ChatGPT was more like a savior called in when Siri couldn't answer, branded by OpenAI with a pop-up interface. This time, Gemini goes directly into the underlying technology, becoming part of Apple's next-generation foundation model.

The key action was distillation. Google gave Apple full access to Gemini. Apple uses the large model in Google data centers to generate high-quality answers and reasoning processes, then uses these results to train smaller, cheaper models capable of running on iPhones.

In a technical paper released the day before WWDC, Apple packaged this collaboration as the third generation of Apple Foundation Models, custom-developed five models in partnership with Google. These include the 3 billion-parameter AFM 3 Core for on-device use, a 20 billion-parameter but conditionally activated sparse model called AFM 3 Core Advanced, the cloud-based AFM 3 Cloud and image model ADM 3 Cloud, and the most powerful AFM 3 Cloud Pro.

A more practical change lies in computing power. On-device models, no matter how smart, cannot handle every task. Apple's Private Cloud Compute infrastructure alone cannot fully sustain Gemini-level inference, so some requests will run on Nvidia GPUs in Google Cloud. Apple subsequently confirmed that PCC has been extended beyond Apple's own data centers for the first time, with the technology stack covering Nvidia Confidential Computing, Intel TDX, and Google Titan chips. Apple emphasized that it still controls the PCC software, devices only trust programs cryptographically approved by Apple, and the relevant binaries are open for inspection by security researchers.

Apple didn't truly abandon control, but it did abandon the dignity of full self-reliance.

Bones are Borrowed

To understand Apple's position in the AI era, one must first recognize its most core asset.

It's not the chip, nor the model. It's the devices. These devices contain photo albums, emails, calendars, maps, and payments, carrying the fragments of many ordinary lives. Any AI capable of mobilizing these fragments becomes more than just a chatbot; it can become a true personal intelligence hub.

Apple started paving the way for this hub early on. Workflow, acquired in 2017, later became Shortcuts, deeply integrated with Siri and system automation. App Intents, launched in 2022, allowed third-party apps to expose their capabilities to the system's entry points. By the Apple Intelligence era, these interfaces became the hands and feet for AI to interact with real-world actions.

With these interfaces, OpenAI can come in, and Gemini is now in. In the Chinese market, local partners can be found in the future. But their way in isn't to take over the iPhone directly; they are packaged within Apple's permission framework and privacy rules.

What Apple fears most isn't someone having a better model. It fears users bypassing the system entirely, entrusting their lives to another entry point. If one day users open not an app but an AI assistant capable of orchestrating everything for them, Apple risks being reduced to a well-crafted shell.

So, from now on, the "Apple" in Apple Intelligence represents product control more than it does complete technological sovereignty. The skin is its own, the clothes are tailored by it, but the bones are borrowed. Google provides the skeleton, Nvidia provides the joints, and Apple's job is to dress this body in its own clothes and walk out the door.

Google gets a massive endorsement from this deal – even Apple acknowledges Gemini's underlying capabilities are more reliable. Nvidia gets another proof point: even with the strongest consumer-grade chips and ambitions for custom servers, when it comes to frontier reasoning and complex agent tasks, you can't bypass GPU clouds.

But the more bones you borrow, the less the body is entirely your own. Behind each borrowed bone lies a vendor's business interests, regulatory pressures, and technological cadence. If one day someone wants to pull those bones back, can Apple stand firm? The company doesn't need to answer this question right now, but it will eventually have to.

A New Tenant Living in the System

Ordinary people don't care about model parameters. They care whether their phone can bother them less.

On the WWDC26 stage, Apple said: "There are times when you expect more from Siri."

For Apple, this almost amounts to an apology.

Then it tried to show you a different kind of morning.

You wake up to twenty notifications piled on the screen. In the past, you had to swipe them away one by one. Now the system has prioritized them for you: messages from your boss are at the top, ads and promotions are collapsed into one line of gray text. You open an email, and a long work email has already been summarized into three sentences. You decide to reply, and Siri drafts a response based on your usual tone with that person. You remember you need to call a merchant about a return in the afternoon, but before you even dial, the system has already pulled the order number from your emails from two days ago and pasted it onto the call interface.

This is the story Apple wants to tell: a layer of intelligence spread beneath the system, taking care of those repetitive cognitive chores daily. Read less fluff, search for fewer files, and get interrupted by notifications less often.

To tell this story, Apple has almost rebuilt Siri's entry points. On the iPhone, it's embedded in the Dynamic Island; a simple swipe down starts a conversation. On iPad and Mac, it merges with Spotlight. It has its own standalone app capable of saving and continuing past conversations, syncing across devices via iCloud. Apple wants Siri to become an AI assistant living within the system, possessing memory and context, but it tries not to make it look like ChatGPT.

Vision is also an important direction. A new Siri mode has been added to the camera: point it at food, and it provides nutritional information; point it at something you don't recognize, and it identifies and searches for it. System-wide dictation isn't just speech-to-text anymore; it automatically adds punctuation and adjusts formatting, turning spoken words into text ready to be sent.

The developer side is also being paved. Apple has opened up the Core AI framework, allowing third parties to load their own models on the device. With the updated App Intents, Siri can more easily understand third-party apps. The Foundation Models Framework no longer only calls Apple's on-device models but also supports connecting to external providers like Claude and Gemini. Apple is laying a path for the entire ecosystem: for Siri to perform tasks across apps, developers must expose their content and actions for the system to understand.

If these plans materialize, Apple AI will no longer just be "Chatty Siri."

But this time, Apple is much more cautious than in the past. Siri AI will only be available to users in beta form later this year, starting with English. And the same Apple Intelligence in China may very well be a different product altogether.

For Chinese users, watching Apple AI is mostly just for fun. The keynote is exciting, the features look cool, but "not available in China."

The Chinese market has its own comprehensive rules for generative AI, including record-filing, content security, and data localization. Apple needs to find a local model partner and pass regulatory approval. The issue for Apple Intelligence in China isn't just a delay of a few months; it may be built on a completely different underlying architecture.

Users in the US see a combination of Apple's models and Gemini. Users in China might see a version kneaded together from Apple's system permissions, local cloud services, local models, and regulatory requirements. They are both called "Apple Intelligence," but their actual capabilities and reachable boundaries could be vastly different.

iCloud services in mainland China are operated by GCBD (Guizhou on Cloud). The cloud drive saves files, and the AI needs to understand files; the cloud drive saves photos, and the AI needs to interpret photos; the cloud drive syncs Notes, and the AI needs to extract your plans, habits, and relationships from those notes. This data gains entirely new uses in the AI era and naturally faces different levels of regulation.

A more immediate threat comes from competition. Domestic smartphone manufacturers are moving very fast in areas like on-device large models, Chinese voice assistants, and imaging AI. For Chinese users, spending over ten thousand yuan on a new iPhone, only to find its core AI features unusable, might make them consider switching brands.

The daily scenarios in the Chinese market are particularly tricky for Apple. WeChat, Alipay, Meituan, Douyin, ride-hailing services, government services, hospital appointments—these are the things many people actually use their phones for daily. If an AI assistant can't enter these scenarios, can't understand group chats, receipts, verification codes, and the many expressions only locals understand instantly, it can hardly be called "intelligent."

Understanding a Person

Apple Intelligence also has another problem: it doesn't cover all iPhones.

iOS 27 might cover the iPhone 11 and the second-generation iPhone SE, but Apple Intelligence requires at least an iPhone 15 Pro or newer, M-series iPads, and Macs. The most powerful on-device models have even higher requirements:

Welcome to Join Odaily Official Community