OpenAI đang "nuốt chửng" tầng ứng dụng? a16z cho rằng cơ hội thực sự nằm ngoài các mô hình tổng quát
- Quan điểm cốt lõi: Cơ hội thực sự cho các startup ứng dụng AI không nằm ở việc cạnh tranh với các công ty mô hình lớn về các công cụ ngang hàng tổng quát (tức "con đường gạch vàng"), mà là ở việc thâm nhập sâu vào các quy trình ngành, xây dựng các giải pháp theo chiều dọc dựa trên quy trình làm việc phức tạp, tích lũy dữ liệu, quản trị tuân thủ và khả năng tích hợp hệ thống.
- Các yếu tố chính:
- Các phòng thí nghiệm mô hình lớn (như OpenAI, Anthropic) đang chiếm lĩnh "con đường gạch vàng" thông qua năng lực mô hình + các công cụ ngang hàng (ví dụ: sinh mã), nhưng không thể xâm nhập vào các bối cảnh theo chiều dọc yêu cầu nhiều bước và nhiều bên tham gia.
- Hào phòng thủ của các startup đến từ dữ liệu và vòng lặp học hỏi tích lũy xung quanh các ngành cụ thể, bao gồm các quy tắc bất thành văn và kiến thức bộ lạc, những thứ không nằm trong tập dữ liệu huấn luyện công khai.
- Các công ty tầng ứng dụng có thể tối ưu hóa chi phí và quản lý độ phức tạp bằng cách định tuyến qua nhiều nhà cung cấp, chọn mô hình tốt nhất cho từng nhiệm vụ phụ và gánh chịu chi phí di chuyển khi nâng cấp mô hình.
- Cung cấp một "mặt phẳng điều khiển" quản trị, tuân thủ và kiểm toán dựa trên các trường hợp sử dụng cụ thể, như xử lý quy tắc HIPAA, FINRA, là giá trị cốt lõi khó có mô hình ngang hàng nào thay thế được.
- Các sản phẩm "kiểu hệ thống" của công ty theo chiều dọc thay thế lao động con người và gắn chặt với quy trình làm việc. Khách hàng trả tiền dựa trên kết quả kinh doanh (ví dụ: lead bán hàng) chứ không phải điểm chuẩn, tạo ra giá trị vòng đời khách hàng cao.
Original title: Avoiding Death on the Yellow Brick Road
Original author: Joe Schmidt IV, a16z
Translation and compilation: Peggy
Editor's note: As the capabilities of large models continue to improve, a common anxiety is emerging in the AI application layer: if model companies like OpenAI and Anthropic already possess the underlying models, distribution channels, and brand advantages, what can startups still do in the application layer?
This is precisely the question a16z partner Joe Schmidt attempts to answer in this article. Borrowing the "Yellow Brick Road" metaphor from "The Wizard of Oz," he divides AI application opportunities into two categories: the main road that large model companies are personally traveling, such as code generation, writing, image generation, general-purpose agents, and horizontal office assistants; and "the rest of Oz," which encompasses deep industry processes, complex workflows, accumulated data, compliance governance, and system integration capabilities.
In his view, the real opportunity for startups lies in the latter.
From sales to insurance, Joe Schmidt reiterates the same logic: what enterprises are genuinely willing to pay for is not a smarter chat window, but a system that can take responsibility for business outcomes. It needs to understand the messy state of customer data, handle multi-person approvals and edge cases, assume compliance and audit responsibilities, and manage migration, routing, and cost optimization for customers as models continuously upgrade.
This is also the core judgment of this article on the next generation of enterprise software: The underlying models will become increasingly powerful, but also increasingly interchangeable; what is truly irreplaceable is the data, processes, governance capabilities, and operational memory accumulated around specific industries and specific workflows. The opportunity for AI application companies lies not in competing with model companies for the "Yellow Brick Road," but in venturing into places that are more complex, messier, slower, yet closer to real business value.
Below is the original text:
Recently, I keep hearing the same question from founders and potential employees: Is there anything left to do in the AI application layer? Or will OpenAI and Anthropic eventually kill everything?
Behind this question lies a typical kind of AI-related anxiety. Some have concluded that if you don't want to be perpetually stuck at the bottom, the only positions with long-term value are either inside a large model lab, or building a startup in robotics, hard tech, or similar frontier areas—theoretically, doing things that "the labs cannot touch." Because if every category of software is going to be consumed—either directly absorbed by Codex or Claude for specific tasks, or rendered unnecessary by some future model—then the best option seems to be: Run!
I admit that I am almost an AI maximalist myself, and I think they are half right. Large model labs are indeed entering large swaths of the application layer. But the "application layer" is not a homogeneous set of opportunities. The truly important criterion is this: are you walking on the "Yellow Brick Road," or are you in the rest of Oz?
Note: The "Yellow Brick Road" is the main road in "The Wizard of Oz" leading to the heart of the Emerald City in Oz, to see the "Wizard."
The "Yellow Brick Road" is how we describe the path that large model labs are walking and pouring massive resources into. Problems like code generation, writing, and image creation are naturally suited for the labs because they get better as the raw capabilities of the models improve: every dollar invested in pre-training and post-training directly improves product quality.
But the rest of Oz contains more complex, often more vertical problems. These are not simply a matter of giving an enterprise user a horizontal tool that can interface with standard tools and computer operations. The value here comes more from the scaffolding around the model: the scaffolding that makes the output trustworthy, compliant, and truly integrated into business processes within a specific industry. The raw power of the underlying model is still important, but it is no longer everything.
We are seeing this happen in real-time. OpenAI and Anthropic are practically admitting to the market that they cannot solve all problems with a general-purpose AI colleague. They have announced massive investments in front-line deployment-style joint ventures, building entire companies around configuring and customizing models for enterprises. If they truly believed the next model release would solve these problems, they wouldn't be pouring billions of dollars into such projects.
So, if you want to make money building AI applications, don't walk the Yellow Brick Road. Go build in the rest of Oz. Here are the lessons we, and some founders in our portfolio, have learned in practice.
The Yellow Brick Road
If you are starting a company, the Yellow Brick Road is the most obvious path, but also the most dangerous one. Take a high-performance model, connect it to some off-the-shelf connectors like Google Drive, Slack, Salesforce, Notion, GitHub, and build an agent orchestration layer on top. It looks like magic.
The problem is that this is exactly what the large model labs are doing with Cowork and Codex. Obviously, they own the models, which means they have better margins, more control, and pricing power over all downstream participants. But perhaps more importantly, they also control the architectural choices that determine what problems the product is suitable for solving. So far, they have been very deliberate in adopting the "model + tool calling" pattern, which is precisely the pattern needed for the horizontal, low-step-count tasks on the Yellow Brick Road. Even if a startup could somehow surpass Codex or Claude Code, the large model labs still possess immense distribution power and the strongest brand halo in the AI field.
If you are an AI application company using the same playbook—connecting to the same connectors, without lower-level sub-agents or configurations, and without distribution—you are likely on a path to nowhere.
The Rest of Oz
The situation is not entirely pessimistic for startups. Beyond the Yellow Brick Road, there are still enormous opportunities. Startups can own customers and solve complex problems in these areas.
These companies are building agent experiences: models woven into complex networks of tools, automations, and integrations—in other words, software. This naturally makes most of these startups vertical by nature. They can focus on multi-step, multi-party workflows, design sub-agents for different roles and vertical scenarios, and tackle problems that Anthropic's and OpenAI's horizontal platforms find difficult: gathering context across systems, then routing tasks to multiple people who need to approve at different stages.
This type of work usually involves one or more legacy systems, often requires deterministic outcomes because ambiguity is unacceptable, and sometimes is directly tied to an important business result. The large model labs certainly know how valuable these problems are: that's why they are building their own outsourced configuration teams, and why an entire industry of customer-facing reinforcement learning service companies is emerging.
Why the Rest of Oz Won't Be Completely Occupied by the "Wizard"
One counterargument to the above is that, so far, betting against the models or the labs continuing to improve has been a terrible trade. They will likely keep getting stronger and eventually eat the markets these application-layer companies serve.
The large model labs will certainly continue to improve. But I believe companies in the rest of Oz still have several modes of defense in the long run.
Data and Learning Flywheels
Much of what you truly internalize in a business doesn't exist in any training set: unwritten industry conventions, undocumented standards, tribal knowledge that lives in practitioners' heads. None of it is on the public internet. No amount of training compute can replace actually entering the workflows where this knowledge resides.
Two flywheels overlap here: one is the cross-customer flywheel, where patterns compound as you see more variants of the same problem; the other is the within-customer flywheel, where the reasons behind specific decisions, unspoken exceptions, and the company's own heuristics only emerge when users interact with the system authentically.
Even if customer data cannot be used across customers, an application company can still leverage pattern recognition from different types of customer problems and use it to guide the architecture of future problems. If a company has had its agents handle a hundred legal redline revisions, a thousand insurance underwriting cycles, or ten thousand SDR sales development activities, its understanding of the problem landscape is something a later entrant firing up a new agent for the first time cannot replicate.
In theory, a horizontal agent could build the same learning infrastructure. But the reason it doesn't, besides a lack of focus, is primarily user experience. Capturing this knowledge depends entirely on what kind of workflow interface you provide to the user. Vertical players can design these interfaces around the information that truly needs to be surfaced for a specific workflow; horizontal tools cannot. Evaluation sets, labeled outputs, edge case taxonomies—all can compound into a vertical data flywheel that further supports fine-tuning. A later entrant, lacking equivalent production exposure, will struggle to generate this flywheel. Its viability depends on data rights, accumulated production usage, and customer contract structures, but the pattern recognition itself will continue to accumulate.
Managing Model Volatility and Complexity
Within large model labs, routing already happens: using different model classes for different requests, employing model ensembles at the bottom layer. But what they cannot do is route across providers, evaluate a competitor's model for a specific sub-task, or use the truly best-fit open-source fine-tuned model for a narrow step.
Companies in the rest of Oz will choose the best model for each sub-task across the entire model marketplace, not just relying on models released by a single parent lab. They will also take on the thankless work: re-running evaluations with every new model release, re-calibrating prompts for customer edge cases, and managing rollouts without breaking production. Large model labs won't do this for their customers. They sell you a new model and tell you to migrate. Companies in the rest of Oz absorb the migration cost. The customer gets the best intelligence available on the entire market, along with continuity through every upgrade cycle.
Cost Optimization
Throwing every query at Opus 4.7 is the fastest way to make your gross margins negative. The best Oz companies route between different tiers of models: the hardest tasks go to frontier models, most go to mid-tier models, and smaller custom or fine-tuned models are used where they have proven effective.
Some of these companies are now doing their own post-training on top of this, optimizing models for the narrow slice of work that customers truly care about, delivering it at a fraction of the cost of calling a frontier API. Large model labs price from the "floor": the minimum level of intelligence you can buy for X dollars. Oz companies sell the inverse: the minimum dollar cost for the level of intelligence actually needed for a specific workflow. This is only possible when you know precisely what level of intelligence is required for each sub-task. And large model labs are structurally incapable of knowing every task in every vertical industry. Ultimately, this translates directly into lower, more predictable pricing based on outcomes.
Governance
Becoming the control plane for a customer's AI operations in a specific vertical generates significant value. This control plane is where permissions, audits, what agents are allowed to do, and what agents actually did, all converge.
This control plane is built on guardrails specific to the use case, and these guardrails differ completely across industries and job types. Because these companies own the tools, workflows, and data their agents touch end-to-end, they can provide deterministic results in ways that horizontal tools cannot. They also absorb regulatory complexity for the end buyer: the Federal Rules of Civil Procedure and Rules of Professional Conduct for lawyers in the legal field, HIPAA in healthcare, SEC and FINRA rules in finance, state-level insurance regulations, and so on. Horizontal players cannot do this convincingly without turning themselves into a hundred different verticals. What a CIO needs is a partner who can contractually commit to handling compliance for the agents they provide.
All of this ultimately comes back to the same thing: focus.
This focus can be a vertical industry, like insurance, law, or accounting; or it can be a function done deeply enough, like sales, customer service, or finance. Whichever it is, the work requires a team to be deeply embedded with the same type of customer base over the long term, understanding their workflows, edge cases, and regulatory requirements. The large model labs are not built for this. They must serve everyone, cover everywhere—that's why they built the Yellow Brick Road in the first place. The same trade-off makes it difficult for them to enter the rest of Oz: you can be everywhere at once, or you can be the best at one thing, but you cannot be both.
Sales as an Example: Practical Advice from the CEO of 11x
How should one understand this in practice? Here is some practical advice from Prabhav Jain, CEO of 11x.
Focus on Outcomes
A viable tactical path to building a company resilient to disruption from large model labs is to start from the specific outcomes that customers truly care about. For us, this outcome is helping businesses generate more sales leads and pipeline.
Starting from here, the questions become very concrete: Which activities do we want to own end-to-end that actually drive pipeline growth? Break each activity down into tasks. Which tasks are suitable for agents, and which are not? Which require complex domain insight, and which do not? Large model labs will also release workflows, but when a workflow has many steps, messy inputs, a state that's hard to explain, or real-world constraints, simply having a better model doesn't get the job done. The work then falls back on traditional software engineering, and at that level, a large model lab has no advantage over a focused application company.
For example, some tasks we handle include: lead generation based on custom signals, lead enrichment, deep account research, grabbing context from CRM, writing messages for different channels, lead qualification agents, and email deliverability systems. Some are agent tasks, some are not. These tasks aren't solved with a single prompt; they require deep engineering capabilities.
The key insight from the Oz analogy is that, roughly speaking, half of any real workflow consists of non-agent tasks, and this half doesn't confer an advantage to the labs. Beneath the model layer, their ability to write deterministic software is no better than yours. And the other half—the agent tasks—still require you to tune, train, and constrain the model around the specific outcome you want.
Domain knowledge is often not in general-purpose training data. These capabilities must be built bottom-up from a vertical industry or specific function and fed to the model at the right point in the workflow. When our agent qualifies an inbound lead over the phone, it must be trained to understand what constitutes a good sales conversation for a specific industry and user profile. This is the work for an application company, and this capability compounds.
More importantly, these capabilities constantly become outdated because the businesses themselves evolve. Therefore, your ability to continuously evolve workflows and context becomes a competitive advantage itself. For instance, when we started building our scaled email outreach product, "AI-written emails" were just beginning to appear. Fast forward to today, people have developed a keen sense for distinguishing AI-written emails from human-written ones, and crucially, this judgment changes every few months. Our agent must constantly adapt to market dynamics, but this is precisely where the moat is built. In fact, despite this dynamic, our positive reply rate has increased 4x over the past few months, generating hundreds of millions of dollars in sales pipeline for our customers.
Tackle High-Complexity Problems
Complex problems are where real business value is unlocked. Otherwise, you can easily find yourself building just a thin wrapper layer.
Decompose any sufficiently complex business problem, and you'll quickly encounter chaos. Here's a simple-sounding example from the GTM space: if a company is already your customer, you shouldn't contact a contact within that company. But this is far from simple.
Perhaps your CRM has the domain name for that company. What about companies with dozens of subsidiaries? What if the CRM records the parent company's domain? What if an outdated matching field in Salesforce causes you to send a cold outreach email to an existing customer's Chief Revenue Officer? Real-world data is messy. Humans struggle to handle it, and models don't magically bypass this hurdle. Bringing order to this chaos requires designing specialized agents around the specific contours of the problem, not just pointing a general-purpose co-pilot at the CRM. In fact, based on our data, we've found that our data quality and freshness often exceed the customer's own, so we default to anchoring on our own data.
Guardrails Aren't Just for Preventing Bad Things. Customers Pay for This.
Guardrails are severely underestimated. Even within the same product, every use case needs its own guardrails. For us, a regulated financial services prospect requires entirely different assurances than a mid-market SaaS customer. And these assurances cascade down to how the agent writes, who it can contact, what data it can access, what it can say on the phone, and how every decision is recorded.
A "one-size-fits-all" system fails in the face of such differences. Guardrails must be built per use case, configured per customer, and continuously audited—all of which falls squarely on the application company. This is why we need field deployment engineers and technical deployment strategists to tune for each customer's specific requirements.
For example, we worked with a Fortune 1000 institution to make consented outbound calls to its massive SMB customer base via voice. In the initial rounds, answer rates were very low. We had to iterate rapidly to learn how to engage this specific audience within the first 10 seconds of a call. SMB business owners behave differently from large B2B buyers or consumers. Now, we generate more sales opportunities for them in a single day than their entire sales team could generate in that segment in a month.
Insurance as an Example: Practical Advice from the CEO of FurtherAI
Sales is just one example. Insurance is another, illustrating the same point from a different angle. Here is Aman Gour, CEO of FurtherAI, on what it means to "build away from the Yellow Brick Road."
When we started deploying AI into real insurance operations, we repeatedly encountered one assumption: the model is the intelligence, and the workflow is just scaffolding built around the model.
But the more insurers we partner with, the more convinced we become that the opposite is true.
In insurance, a lot of the intelligence resides within the workflow itself. Two insurance companies can run a submission through what looks like the same path: submission,


