OpenAI is eating the application layer? a16z says the real opportunity lies beyond general-purpose models.
- Core Thesis: The real opportunity for AI application startups doesn’t lie in competing with large model companies for horizontal general-purpose tools (the "yellow brick road"). Instead, it lies in deeply embedding into industry workflows to build vertical solutions that depend on complex workflows, proprietary data accumulation, compliance governance, and system integration capabilities.
- Key Elements:
- Large model labs (e.g., OpenAI, Anthropic) are capturing the "yellow brick road" with model capabilities plus horizontal tools (e.g., code generation), but they cannot penetrate the multi-step, multi-stakeholder vertical scenarios that require deep domain expertise.
- Startups’ moats come from the data and learning flywheels accumulated around specific industries, including unwritten industry customs and tribal knowledge not found in public training datasets.
- Application-layer companies can optimize costs and manage complexity by routing across vendors, selecting the best model for specific subtasks, and bearing the migration costs of model upgrades.
- Providing a "control plane" for governance, compliance, and auditing based on specific use cases (e.g., handling HIPAA or FINRA regulations) is a core value that horizontal models can hardly replace.
- Vertical companies offer "system-level" products that replace human labor and lock in workflows. Customers pay based on business outcomes (e.g., sales leads) rather than benchmark performance, resulting in high customer lifetime value.
Original Title: Avoiding Death on the Yellow Brick Road
Original Author: Joe Schmidt IV, a16z
Translation & Compilation: Peggy
Editor's Note: As the capabilities of large models continue to improve, a common anxiety is emerging in the AI application layer: If model companies like OpenAI and Anthropic possess not only the underlying models but also distribution channels and brand advantages, what can startups still do in the application layer?
This is precisely the question a16z partner Joe Schmidt attempts to answer in this article. Borrowing the metaphor of the "Yellow Brick Road" from *The Wizard of Oz*, he divides AI application opportunities into two categories: one is the main road that large model companies are currently traversing themselves, such as code generation, writing, image generation, general-purpose agents, and horizontal office assistants. The other is the "rest of Oz" – deeply vertical scenarios that require integration into industry workflows, complex processes, accumulated data, compliance governance, and system integration capabilities.
In his view, the real opportunity for startups lies in the latter.
From sales to insurance, Joe Schmidt repeatedly emphasizes the same logic: What enterprises are truly willing to pay for is not a smarter chat window, but a system that can take responsibility for business outcomes. It needs to understand the chaos of customer data, handle multi-person approvals and edge cases, bear compliance and audit responsibilities, and manage migration, routing, and cost optimization for the customer as models continue to upgrade.
This is also the core judgment of this article regarding the next generation of enterprise software: Underlying models will become more powerful, but also increasingly replaceable; what is truly irreplaceable is the data, processes, governance capabilities, and operational memory accumulated around specific industries and specific workflows. The opportunity for AI application companies lies not in competing with model companies for the "Yellow Brick Road," but in venturing into those more complex, messier, slower places that are closer to real business value.
The following is the original text:
Lately, I keep hearing the same question from founders and prospective employees: Is there anything left to do in the AI application layer? Or will OpenAI and Anthropic eventually kill everything?
This question reflects a typical AI-era anxiety. Some have already concluded: To avoid being permanently stuck at the bottom, the only positions with long-term value are either inside a large model lab, or starting a company in robotics, hard tech, or similar frontier areas – theoretically, things that "labs cannot touch." Because if every category of software will be devoured, either directly by Codex or Claude absorbing the corresponding work, or rendered unnecessary by some future model, then the best option seems to be: run!
I admit, I am almost an AI maximalist myself, and I think they are half right. Large model labs are indeed entering vast areas of the application layer. But the "application layer" is not a homogeneous set of opportunities. The truly important criterion is: Are you walking the "Yellow Brick Road," or are you somewhere else in Oz?
Note: The "Yellow Brick Road" is the main road in *The Wizard of Oz* leading to the Emerald City, the core area where the "Wizard" resides.
The "Yellow Brick Road" is our term for the path that large model labs are taking and investing significant resources in. Problems like code generation, writing, and image creation are naturally suited for labs because they improve with every advance in the models' raw capabilities: each dollar invested in pre-training and post-training directly improves product quality.
But elsewhere in Oz, there are more complex, often more vertical problems. These cannot be solved by simply giving an enterprise user a horizontal tool connected to standard tools and computer operations. The value here comes more from the scaffolding around the model: scaffolding that makes the output trustworthy, compliant, and actionable within specific industries. The raw capability of the underlying model certainly still matters, but it is no longer everything.
We are seeing this play out in real-time. OpenAI and Anthropic are essentially admitting to the market: they cannot solve all problems with a single, general-purpose AI colleague. They have announced massive investments in front-line deployment-style joint ventures, building entire companies around configuring and customizing models for enterprises. If they truly believed their next model release would solve these problems, they wouldn't be pouring billions into such projects.
So, if you aim to make money building AI applications, don't walk the Yellow Brick Road. Go build elsewhere in Oz. Here is what we, and some founders in our portfolio, have learned in practice.
The Yellow Brick Road
If you are starting a company, the Yellow Brick Road is the most obvious path, but also the most dangerous one. Take a high-performance model, connect it to some off-the-shelf connectors like Google Drive, Slack, Salesforce, Notion, GitHub, and add an agent orchestration layer on top. It looks like magic.
The problem is that this is precisely what the large model labs are doing with Cowork and Codex. Obviously, they own the models, which gives them better margins, more control, and pricing power over all downstream participants. But perhaps more importantly, they also control the architectural choices that determine what problems the product is suitable for solving. So far, they have intentionally adopted the "model + tool calling" pattern, which is exactly what those horizontal, low-step-count tasks on the Yellow Brick Road require. Even if a startup could surpass Codex or Claude Code in some way, the labs still possess massive distribution power and the strongest brand halo in the AI field.
If you are an AI application company using the same playbook – connecting to the same connectors, lacking sub-agents or configuration, and without distribution – you are likely on a path to nowhere.
The Rest of Oz
The situation isn't entirely bleak for startups. Beyond the Yellow Brick Road, there remain huge opportunities. Startups can own customers and solve complex problems in these areas.
These companies are building agent experiences: models woven into complex networks of tools, automation, and integration – in other words, software. This naturally makes most of these startups vertical. They can focus on multi-step, multi-participant workflows, design sub-agents for different roles and vertical scenarios, and handle problems that Anthropic's and OpenAI's horizontal platforms struggle to reach: gathering context across systems, then routing tasks to multiple people who need to approve at different stages.
This type of work usually involves one or more legacy systems, often requires deterministic results because ambiguity is unacceptable, and sometimes is directly tied to a significant business outcome. The large model labs obviously know how valuable these problems are: that's why they are building their own outsourced configuration teams, and why a whole ecosystem of customer-facing reinforcement learning service companies is emerging.
Why the Rest of Oz Won't Be Fully Consumed by the "Wizard"
A counter-argument to the above is that betting against models or labs continuing to improve has been a terrible trade so far. They will likely keep getting stronger and eventually eat the markets these application-layer companies serve.
The large model labs will certainly continue to improve. But I believe companies elsewhere in Oz still have several defensible positions in the long run.
Data and Learning Flywheels
Much of what you truly internalize in business isn't in any training set: unwritten industry conventions, undocumented standards, tribal knowledge in practitioners' heads. None of it is on the public internet. No amount of training compute can replace actually being inside the workflows where this knowledge resides.
Two flywheels compound here: a cross-customer flywheel, where patterns compound as you see more variants of the same problem; and a customer-internal flywheel, where the reasons behind specific decisions, unspoken exceptions, and the company's own rules of thumb only emerge when users interact with the system authentically.
Even if customer data cannot be shared across customers, an application company can still leverage pattern recognition across different customer problem types and use it to architect solutions for future problems. A company whose agents have handled a hundred legal redline revisions, a thousand insurance underwriting cycles, or ten thousand SDR sales development activities has an understanding of the problem landscape that a newcomer cannot replicate by simply spinning up a new agent for the first time.
Theoretically, a horizontal agent could build the same learning infrastructure. But it doesn't, not just due to lack of focus, but also because of user experience. Capturing this kind of knowledge depends entirely on the workflow interface you provide to users. Vertical players can design these interfaces around the information that truly needs to be revealed for a specific workflow; horizontal tools cannot. Evaluation sets, labeled outputs, edge case taxonomies can all compound into a vertical data flywheel, further supporting fine-tuning. Latecomers without comparable production exposure struggle to generate this flywheel. Its feasibility depends on data rights, accumulated production usage, and customer contract structures, but the pattern recognition itself continues to accumulate.
Managing Model Volatility and Complexity
Large model labs already do routing internally: directing different requests to different model classes, using model ensembles under the hood. But they cannot do cross-vendor routing, evaluate a competitor's model for a specific sub-task, or use a truly optimal open-source fine-tuned model for a narrow step.
Companies elsewhere in Oz will select the most suitable model for each sub-task across the entire model marketplace, not just models from a single lab. They also absorb the thankless work: re-running evaluations with each new model release, re-calibrating prompts for customer edge cases, handling rollouts without breaking production. The large model labs won't do this for customers. They sell you the new model and tell you to migrate. Companies elsewhere in Oz absorb the migration cost. The customer gets the best intelligence available on the market, along with continuity through each upgrade cycle.
Cost Optimization
Throwing every query at Opus 4.7 is the fastest way to turn gross margin negative. The best Oz companies route between different model tiers: the hardest tasks go to frontier models, most tasks go to mid-tier models, and smaller custom or fine-tuned models are used where proven effective.
Some of these companies now do their own post-training on top of this, optimizing the model for the narrow slice of work the customer truly cares about, serving it at a fraction of the cost of a frontier API call. The large model labs price for the "floor": the minimum intelligence level you can buy for X dollars. Oz companies sell the inverse: achieving the lowest dollar cost for the specific level of intelligence a workflow actually needs. This is only possible when you are intimately familiar with the level of intelligence required for every sub-task. The large model labs, by structure, cannot understand every task in every vertical industry. Ultimately, this translates directly into lower, more controllable outcome-based pricing.
Governance
Becoming the customer's control plane for running AI in a specific vertical generates significant value. This control plane is where permissions, audit logs, what agents are allowed to do, and what agents actually did all converge.
This control plane is built on use-case-specific guardrails, which differ wildly across industries and job functions. Because these companies own the tools, workflows, and data the agent touches end-to-end, they can provide deterministic outcomes in ways horizontal tools cannot. They also absorb regulatory complexity for the end buyer: the FRCP and bar rules in law, HIPAA in healthcare, SEC and FINRA rules in finance, state-level insurance regulations, and so on. A horizontal player cannot do this convincingly without transforming itself into a hundred different verticals. The CIO needs a partner who can contractually commit to handling the compliance burden for the agents it provides.
All of this comes back to the same thing: Focus.
This focus can be a vertical industry, like insurance, law, or accounting; or a sufficiently deep function, like sales, customer service, or finance. Either way, the work requires a team deeply embedded in the same customer base over a long period, understanding its workflows, edge cases, and regulatory requirements. The large model labs were not built for this. They have to serve everyone, go everywhere – that's why they built the Yellow Brick Road in the first place. The same trade-off will make it difficult for them to enter the rest of Oz: you can be everywhere at once, or be the best at one thing, but not both.
Sales as an Example: Practical Advice from the CEO of 11x
What does this mean in practice? Here is some practical advice from Prabhav Jain, CEO of 11x.
Focus on Outcomes
A viable tactical path to building a company resistant to disruption by large model labs is to start from the specific outcomes customers truly care about. For us, that outcome is helping businesses generate more sales leads and pipeline.
From there, the questions become very concrete: Which activities do we want to own end-to-end that actually drive pipeline growth? Break each activity into tasks. Which tasks are suitable for agents? Which are not? Which require complex domain insights? Which do not? The large model labs will also launch workflows, but when a workflow has many steps, messy inputs, hard-to-interpret state, or real-world constraints, a better model alone isn't enough to make things work. The work reverts to traditional software engineering, an area where the large model labs have no advantage over a focused application company.
For example, some tasks we handle include: lead prospecting based on custom signals, lead enrichment, deep account research, grabbing context from CRM, writing messages for different channels, lead qualification agents, and email deliverability systems. Some are agent tasks, some are not. These tasks aren't completed in one prompt; they require deep engineering.
The key insight from the Oz analogy is: roughly half of any real workflow consists of non-agent tasks, and the lab has no advantage in that half. Beneath the model layer, their ability to write deterministic software is no better than yours. The other half, the agent tasks, still require you to fine-tune, train, and constrain the model towards the specific outcome you want.
Domain knowledge is often absent from general training data. These capabilities must be built bottom-up from a vertical industry or specific function, and fed to the model at the right point in the workflow. When our agent qualifies an inbound lead over the phone, it must be trained to understand what constitutes a good sales conversation for a specific industry, a specific user profile. This is the work of the application company, and this capability compounds.
More importantly, these capabilities constantly become outdated because businesses themselves evolve. Therefore, your ability to continuously evolve workflows and context becomes a competitive advantage. For instance, when we first started our scaled email outreach product, "AI-written emails" were just emerging. Fast forward to today, people have a keen sense for identifying AI-written vs. human-written emails, and crucially, this sense changes every few months. Our agents must constantly adapt to market dynamics, but that is precisely where the moat is built. In fact, despite this dynamic, our positive reply rate has quadrupled over the past few months, generating hundreds of millions of dollars in sales pipeline for our customers.
Tackle High-Complexity Problems
Complex problems are where the real business value is unlocked. Otherwise, you might easily find yourself building just a thin wrapper.
Decomposing any sufficiently complex business problem quickly reveals chaos. Here's a deceptively simple example from the GTM world: If a company is already your customer, you shouldn't contact a contact person within that company. But this is not simple at all.
Maybe your CRM has the domain for that company. What about companies with dozens of subsidiaries? What if the CRM has the parent company's domain? What if a stale match field in Salesforce causes you to send a cold outreach email to the CRO of an existing customer? Real-world data is messy. Humans struggle with it, and models won't magically bypass this hurdle. To build order from this chaos requires dedicated agents designed for the specific shape of the problem, not just pointing a general-purpose copilot at the CRM and calling it done. In fact, based on our data, we found our own data quality and freshness are higher than the customer's own, so by default, we anchor on our own data.
Guardrails Are Not Just to Prevent Bad Things. Customers Pay for This Precisely.
Guardrails are severely underestimated. Even within the same product, each use case needs its own guardrails. For us, a regulated financial services prospect requires completely different assurances than a mid-market SaaS customer. These assurances cascade down to how the agent writes, who it can contact, what data it can access, what it can say on the phone, and how each decision is recorded.
A one-size-fits-all system breaks down under this variability. Guardrails must be built per use case, configured per customer, and continuously audited – and this work falls entirely on the application company. This is also why we need field deployment engineers and technical deployment strategists to tune things per customer requirement.
For example, we worked with a Fortune 1000 company to run consented outbound calls to its massive SMB customer base via voice. Early attempts had low answer rates. We had to iterate rapidly to learn how to engage this specific audience within the first 10 seconds of a call. SMB business owners behave differently from large B2B buyers or consumers. Today, we generate more sales opportunities for them in a single day than their entire sales team could create in that segment in a month.
Insurance as an Example: Practical Advice from the CEO of FurtherAI
Sales is just one example. Insurance is another, illustrating the same principle from a different angle. Here is Aman Gour, CEO of FurtherAI, on the meaning of "building off the Yellow Brick Road."
When we started deploying AI into real insurance operations, we repeatedly heard the hypothesis: The model is the intelligence; the workflow is just scaffolding built around the model.
But the more insurers we worked with, the more convinced we became that the opposite is true.
In insurance, a lot of the intelligence *is* embedded in the workflow itself. Two insurers can run a submission through what looks like the same path: submission, review, quote, underwrite. The path itself is easy. What truly differentiates the two insurers is everything *inside* that path: which risks get escalated, which loss signals matter, which rule takes precedence when two underwriting appetite rules conflict, when human sign-off is mandatory, which external data sources need to be tapped, and finally, how the decision is recorded.
This logic doesn't live in a clean rule engine. It's scattered across SOPs, manager reviews, underwriting philosophies, insurer-specific risk appetites, and years of


