OpenAI Eating the Application Layer? a16z Says the Real Opportunity Lies Beyond General-Purpose Models

区块律动BlockBeats

特邀专栏作者

2026-05-28 06:42

本文約8278字，閱讀全文需要約12分鐘

The AI application layer isn’t dead; it’s just that wrapping models for quick profits is no longer viable.

AI總結

展開

Core Thesis: The real opportunity for AI application startups isn’t competing with large model companies on horizontal general-purpose tools (the “Yellow Brick Road”). Instead, it lies in deeply embedding into specific industry workflows to build vertical solutions that rely on complex workflows, proprietary data, compliance governance, and system integration capabilities.
Key Elements:
1. Large model labs (e.g., OpenAI, Anthropic) are capturing the “Yellow Brick Road” by combining model capabilities with horizontal tools (e.g., code generation). However, they cannot easily penetrate vertical scenarios requiring multi-step, multi-party processes.
2. A startup’s moat comes from the data and learning flywheel accumulated around a specific industry, including unwritten industry conventions and tribal knowledge not present in public training datasets.
3. Application-layer companies can achieve cost optimization and manage complexity by routing across different vendors, selecting the best model for specific sub-tasks, and absorbing the migration costs of model upgrades.
4. Providing a “control plane” for governance, compliance, and auditing based on specific use cases—such as enforcing HIPAA or FINRA rules—is a core value that horizontal models find difficult to replicate.
5. Vertical companies create “systemic” products that replace human labor and embed deeply into workflows. Customers pay based on business outcomes (e.g., sales leads) rather than benchmark scores, leading to high customer lifetime value.

Original title: Avoiding Death on the Yellow Brick Road

Original author: Joe Schmidt IV, a16z

Translation by: Peggy

Editor's note: As large language models continue to improve, a common anxiety is emerging in the AI application layer: if model companies like OpenAI and Anthropic control both the underlying models and have distribution channels and brand advantages, what can startups still do in the application layer?

This is precisely the question a16z partner Joe Schmidt attempts to answer in this article. He borrows the "Yellow Brick Road" metaphor from *The Wizard of Oz*, dividing AI application opportunities into two categories: the main road that large model companies are actively building—such as code generation, writing, image creation, general-purpose agents, and horizontal office assistants—and the "rest of Oz," which encompasses deeply vertical scenarios embedded in industry workflows, dependent on complex processes, accumulated data, compliance governance, and system integration.

In his view, the real opportunity for startups lies in the latter.

From sales to insurance, Joe Schmidt reiterates the same logic: what enterprises are truly willing to pay for is not a smarter chat window, but a system that can be held accountable for business outcomes. This system needs to understand the messy state of customer data, handle multi-person approvals and edge cases, assume compliance and audit responsibilities, and manage migration, routing, and cost optimization for the client as models continue to upgrade.

This is also the core thesis of this article regarding the next generation of enterprise software: underlying models will become increasingly powerful and interchangeable; but the truly irreplaceable elements are the data, processes, governance capabilities, and operational memory accumulated around specific industries and specific workflows. The opportunity for AI application companies lies not in competing with model companies for the "Yellow Brick Road," but in venturing into the more complex, messier, slower, yet ultimately more commercially valuable territories.

Below is the original text:

Lately, I keep hearing the same question from founders and prospective employees: Is there anything left to do in the AI application layer? Or are OpenAI and Anthropic going to kill everything in the end?

Behind this question lies a typical AI-era anxiety. Some have already concluded that if you want to avoid being permanently relegated to the bottom layer, the only positions with long-term value are either inside the big model labs, or starting ventures in robotics, hard tech, or similar frontier areas—theoretically, doing things the labs can't or won't touch. Because if every type of software is going to be eaten, either directly absorbed by Codex or Claude for the task, or rendered unnecessary by some future model, the best option seems to be: run!

I admit, I'm almost an AI maximalist myself, and I think they are half right. Large model labs are indeed entering large swaths of the application layer. But the "application layer" is not a homogeneous set of opportunities. The truly important criterion is: are you walking the Yellow Brick Road, or are you in the rest of Oz?

Note: The "Yellow Brick Road" is the main road in *The Wizard of Oz* leading to the heart of the Emerald City to see the "Wizard."

The "Yellow Brick Road" is how we describe the path the large model labs are taking and investing massive resources in. Problems like code generation, writing, and image creation are naturally suited for labs because they get better as the model's core capabilities improve: every dollar spent on pre-training and post-training directly enhances product quality.

But in the rest of Oz, there exist more complex, often vertical problems. They aren't simply solved by giving an enterprise user a horizontal tool connected to standard tools and computer operations. The value here comes more from the scaffolding around the model: making the output credible, compliant, and truly integrated into business processes within a specific industry. The raw capability of the underlying model still matters, but it's no longer everything.

We are seeing this in real time. OpenAI and Anthropic are, in effect, admitting to the market that they can't solve everything with a single, general-purpose AI colleague. They have announced major, front-line deployment-style joint ventures, building entire companies around configuring and customizing models for enterprises. If they truly believed the next model release would solve these problems, they wouldn't be investing billions in such projects.

So, if you want to make money building AI applications, don't walk the Yellow Brick Road. Go build in the rest of Oz. Here's what we, and some founders in our portfolio, have learned in practice.

The Yellow Brick Road

If you're starting a company, the Yellow Brick Road is the most obvious path, but also the most dangerous. Take a high-performance model, plug it into some off-the-shelf connectors like Google Drive, Slack, Salesforce, Notion, GitHub, and build an agent orchestration layer on top. It looks like magic.

The problem is that this is precisely what the large model labs are doing with Cowork and Codex. Obviously, they own the model, giving them better margins, more control, and pricing power over all downstream participants. But perhaps more importantly, they also hold the architectural choices that determine what problems the product is suited for. So far, they have very deliberately adopted the "model + tool-calling" pattern – exactly the pattern needed for the horizontal, low-step-count tasks on the Yellow Brick Road. Even if a startup could somehow outperform Codex or Claude Code, the labs still possess massive distribution and the strongest brand halo in AI.

If you're an AI application company using the same playbook—connecting to the same connectors, without sub-agents, configuration, or distribution—you're likely walking a path to nowhere.

The Rest of Oz

The picture isn't all bleak for startups. There is still immense opportunity outside the Yellow Brick Road. Startups can own their customers and solve complex problems in these areas.

These companies are building agentic experiences: models woven into complex networks of tools, automation, and integration – in other words, software. This also makes most of these startups naturally vertical. They can focus on multi-step, multi-participant workflows, design sub-agents for different roles and vertical scenarios, and handle problems that horizontal platforms from Anthropic and OpenAI struggle with: gathering context across systems, then routing tasks to multiple people who need to approve at different stages.

This type of work often involves one or more legacy systems, usually requires deterministic output because ambiguity is unacceptable, and sometimes is directly tied to a significant business outcome. The large model labs obviously know how valuable these problems are: that's why they are building their own outsourced configuration teams, and why an entire industry of specialized reinforcement learning service companies for large clients is emerging.

Why the Rest of Oz Won't Be Completely Taken Over by the "Wizard"

A counterargument to the above is that, so far, betting against the models or labs continuing to improve has been a terrible trade. They will likely keep getting stronger and eventually eat the markets these application-layer companies serve.

The large model labs will certainly continue to improve. But I believe companies in the rest of Oz still have several modes of defense in the long run.

Data and Learning Flywheels

Much of what you truly internalize in a business isn't in any training set: unwritten industry conventions, undocumented standards, tribal knowledge in practitioners' heads. None of it is on the public internet. No amount of training compute can replace being inside the workflow where this knowledge resides.

Two flywheels overlap here: the cross-customer flywheel, where patterns compound as you see more variants of the same problem; and the within-customer flywheel, where the rationale behind specific decisions, unspoken exceptions, and the company's own rules of thumb only surface when users interact with the system.

Even with customer data siloed, an application company can leverage pattern recognition across different customer problem types to guide the architecture for future ones. A company that has run its agents through a hundred legal redline exercises, a thousand insurance underwriting cycles, or ten thousand SDR sales development activities has an understanding of the problem structure that a later entrant can't replicate by spinning up a new agent for the first time.

Theoretically, a horizontal agent could build the same learning infrastructure. But besides lack of focus, the key reason they don't is user experience. Capturing this knowledge is entirely dependent on the workflow interface you provide to users. Vertical players can design these interfaces to surface the information truly needed for specific workflows; horizontal tools cannot. Evaluation sets, labeled outputs, and edge case taxonomies can compound into a vertical-specific data flywheel, further enabling fine-tuning. Later entrants without comparable production exposure will struggle to generate this flywheel. Its viability depends on data rights, accumulated production usage, and customer contract structures, but the pattern recognition itself keeps compounding.

Managing Model Volatility and Complexity

Large model labs already do some routing internally: calling different model classes for different requests, using model ensembles underneath. But they can't route across vendors, evaluate a competitor's model for a specific sub-task, or use a truly optimal open-source fine-tuned model for a narrow step.

Companies in the rest of Oz will choose the best model for each sub-task from across the entire model market, not just from a single lab. They will also take on the grunt work no one else wants: re-running evaluations with each new model release, re-calibrating prompts for client edge cases, and handling upgrades without breaking production. The large model labs won't do this for clients. They sell you the new model and tell you to migrate. Companies in the rest of Oz absorb the migration cost. The client gets the best intelligence from the entire market, plus continuity through every upgrade cycle.

Cost Optimization

Throwing every query at Opus 4.7 is the fastest way to turn gross margin negative. The best Oz companies route between model tiers: the hardest tasks to the frontier models, most tasks to mid-tier models, and smaller custom or fine-tuned models where proven.

Some of these companies now do their own post-training on top of this, optimizing models for the narrow sliver of work the customer truly cares about, serving it at a fraction of the cost of a frontier API call. Large model labs price the "floor": the minimum intelligence you get for $X. Oz companies sell the opposite: the lowest dollar cost for the level of intelligence a specific workflow genuinely requires. This is only possible when you know exactly what level of intelligence each sub-task needs. And the large model labs are structurally incapable of knowing every task in every vertical industry. Ultimately, this translates directly to lower, more predictable outcome-based pricing.

Governance

Becoming the control plane for running AI in a specific vertical for your customer creates significant value. This control plane is where permissions, audit trails, what agents are allowed to do, and what agents actually did come together.

This control plane is built on use-case-specific guardrails, which vary completely across industries and job types. Because these companies own the tools, workflows, and data the agent touches end-to-end, they can provide deterministic results in ways horizontal tools cannot. They also absorb regulatory complexity for the end buyer: FRCP and state bar rules in law, HIPAA in healthcare, SEC and FINRA in finance, state-level insurance regulations, etc. A horizontal player cannot credibly do this without turning itself into a hundred different verticals. What a CIO needs is a partner who will put in a contract that they take compliance responsibility for the agents they provide.

All of this comes back to the same thing: focus.

This focus can be a vertical industry, like insurance, legal, or accounting; or a function done deeply enough, like sales, customer support, or finance. Either way, it requires a team long embedded with the same type of customer, understanding their workflows, edge cases, and regulatory requirements. Large model labs are not built for this. They must serve everyone, everywhere – which is why they built the Yellow Brick Road in the first place. The same trade-off makes it hard for them to enter the rest of Oz: you can be everywhere, or you can be the best at one thing, but you can't be both.

A Practical Example: Sales, from 11x's Technical CEO

What does this look like in practice? Here are some practical insights from Prabhav Jain, CEO of 11x.

Focus on Outcomes

A viable tactical path to building a company resilient to the impact of large model labs is to start from the specific outcome the customer truly cares about. For us, that outcome is helping businesses generate more sales leads and pipeline.

From there, the problem becomes very concrete: which activities do we want to own end-to-end that truly drive sales pipeline growth? Break each activity down into tasks. Which tasks are suitable for agents, which are not? Which require complex domain insight, which do not? Large model labs will also release workflows, but when a workflow has many steps, messy inputs, hard-to-reason-about state, or real-world constraints, a better model alone doesn't get the job done. The work reverts to traditional software engineering, and on that front, large model labs have no advantage over a focused application company.

For example, some tasks we handle include: lead generation based on custom signals, lead enrichment, deep account research, pulling context from CRMs, writing messages for different channels, a lead qualification agent, and email deliverability systems. Some are agent tasks, some are not. These can't be accomplished with a single prompt; they require deep engineering.

The key insight from this Oz analogy is: roughly half of any real workflow is non-agent tasks, and that half doesn't favor the labs. Underneath the model layer, their ability to write deterministic software is no better than yours. And the other half, the agent tasks, still require you to fine-tune, train, and constrain the model around the specific outcome you want.

Domain knowledge is often not in the general training data. These capabilities must be built bottom-up from a vertical industry or specific function, and fed to the model at the right moments in the workflow. When our agent qualifies an inbound lead over the phone, it must be trained to understand what constitutes a good sales conversation for that specific industry and persona. This is the work application companies do, and this capability compounds.

More importantly, these capabilities constantly become outdated because businesses themselves evolve. So, your ability to continuously evolve the workflow and context itself becomes a competitive advantage. For instance, when we started our scaled email outreach product, "AI-written emails" were just emerging. Fast forward to today, people have a keen sense for distinguishing AI-written emails from human-written ones, and crucially, this baseline changes every few months. Our agents must constantly adapt to market dynamics, but that's precisely where the moat is built. In fact, despite this dynamism, our positive reply rates have increased 4x in the last few months, generating hundreds of millions of dollars in pipeline for our clients.

Tackle High-Complexity Problems

Complex problems are where real business value is unlocked. Otherwise, you can easily find yourself building a thin wrapper.

Decomposing any sufficiently complex business problem quickly reveals messiness. Here's a seemingly simple GTM example: you shouldn't contact a person at a company if that company is already a customer. But this is far from simple.

Maybe your CRM has the company's domain. What about companies with dozens of subsidiaries? What if the CRM holds the parent company domain? What if an outdated matching field in Salesforce causes you to send a cold outreach email to the CRO of an existing customer? Real-world data is messy. Humans struggle with it, and models don't magically bypass this threshold. Building order from this chaos requires specialized agents designed around the specific shape of the problem, not just pointing a general-purpose co-pilot at the CRM. In fact, based on our data, we found our own data quality and freshness to be superior to the client's, so we anchor on our data by default.

Guardrails Aren't Just About Preventing Bad Things. Customers Pay For Exactly This.

Guardrails are massively underestimated. Even within the same product, each use case needs its own guardrails. For us, a regulated financial services prospect requires completely different assurances than a mid-market SaaS customer. And these assurances cascade down to how the agent writes, who it can contact, what data it can touch, what it can say on the phone, and how every decision is logged.

A "one-size-fits-all" system breaks against this variation. Guardrails must be built per use case, configured per customer, and continuously audited, and that work falls entirely on the application company. This is why we need forward-deployed engineers and technical deployment strategists to tune for each client's requirements.

For example, we worked with a Fortune 1000 institution to make consented outbound calls to their massive SMB customer base via voice. In the first rounds of attempts, pickup rates were very low. We had to iterate rapidly to learn how to engage this specific audience within the first 10 seconds of the call. SMB business owners behave differently from large B2B buyers or consumers. Today, we generate more sales opportunities for them in a single day than their entire sales team could in that segment in a month.

A Practical Example: Insurance, from FurtherAI's CEO

Sales is just one example. Insurance is another, illustrating the same point from a different angle. Here is FurtherAI CEO Aman Gour's perspective on "building outside the Yellow Brick Road."

When we started deploying AI into real insurance operations, we repeatedly heard one assumption: the model is the intelligence, and the workflow is just scaffolding around it.

But the more insurance companies we worked with, the more convinced we became that it's the opposite.

In insurance, a lot of the intelligence *lives in the workflow itself*. Two insurers can have a submission go through what looks like the same path: submission, review, quote, underwrite. The path is easy. What differentiates them is everything *inside* that path: which risks get escalated, which loss signals matter, which rule takes precedence when two underwriting appetite rules conflict, when a human must sign off, what external data to pull, and how the final decision is recorded.

This logic doesn't reside in a clean rules engine. It's scattered across SOPs, manager reviews, underwriting philosophies, insurer-specific risk appetites, and years of operational experience. Much of it isn't written down in a form a model can directly read.

This is why we don't believe in pure agents that reason from scratch every time, nor

a16z

歡迎加入Odaily官方社群