OpenAI is eating the application layer? a16z says the real opportunity lies beyond general-purpose models
- Core Thesis: The true opportunity for AI application startups does not lie in competing with large model companies for horizontal general-purpose tools (i.e., the "Yellow Brick Road"), but rather in deeply integrating into industry workflows to build vertical solutions that rely on complex workflows, data accumulation, compliance governance, and system integration capabilities.
- Key Elements:
- Large model labs (e.g., OpenAI, Anthropic) are capturing the "Yellow Brick Road" through model capabilities plus horizontal tools (e.g., code generation), but they cannot penetrate the multi-step, multi-stakeholder vertical scenarios.
- A startup's moat comes from data and learning flywheels accumulated around specific industries, including unwritten industry practices and tribal knowledge not found in public training sets.
- Application-layer companies can achieve cost optimization and manage complexity by routing across different vendors, selecting the best model for sub-tasks, and bearing the migration costs of model upgrades.
- Providing a "control plane" for governance, compliance, and auditing based on specific use cases—such as handling HIPAA or FINRA rules—is a core value that horizontal models find difficult to replace.
- Vertical companies offer "systemic" products that replace human labor and bind workflows; customers pay based on business outcomes (e.g., sales leads) rather than benchmark scores, resulting in high customer lifetime value.
Original Title: Avoiding Death on the Yellow Brick Road
Original Author: Joe Schmidt IV, a16z
Translation and Editing: Peggy
Editor's Note: As the capabilities of large models continue to improve, a common anxiety is spreading across the AI application layer: If model companies like OpenAI and Anthropic control the underlying models AND have distribution channels and brand advantages, what is left for startups to do in the application layer?
This is precisely the question a16z partner Joe Schmidt attempts to answer in this article. Using the "Yellow Brick Road" from *The Wizard of Oz* as an analogy, he divides AI application opportunities into two categories: one is the main road that large model companies are personally walking down, such as code generation, writing, image generation, general-purpose agents, and horizontal office assistants. The other is "the rest of Oz" – the deep, industry-specific vertical scenarios that require complex workflows, data accumulation, compliance governance, and system integration capabilities.
In his view, the real opportunity for startups lies in the latter.
From sales to insurance, Joe Schmidt repeatedly emphasizes the same logic: what enterprises are truly willing to pay for is not a smarter chat window, but a system that can be held accountable for business outcomes. It needs to understand the messy state of customer data, handle multi-person approvals and edge cases, bear the responsibility for compliance and auditing, and manage migration, routing, and cost optimization for clients as models are continuously upgraded.
This is also the core judgment of this article on the next generation of enterprise software: the underlying models will become more powerful, but also increasingly interchangeable. What is truly irreplaceable is the data, processes, governance capabilities, and operational memory accumulated around specific industries and specific workflows. The opportunity for AI application companies lies not in competing with model companies on the "Yellow Brick Road," but in entering those places that are more complex, messier, slower, yet closer to real business value.
The following is the original text:
Lately, I hear the same question repeatedly from founders and potential hires: Is there anything left to do in the AI application layer? Or will OpenAI and Anthropic eventually kill everything?
Behind this question lies a very typical AI-era anxiety. Some have already concluded: if you don't want to be a permanently low-level layer, the only positions with long-term value are either inside the large model labs, or starting a company in robotics, hard tech, or similar frontier areas – theoretically, in areas "the labs can't touch." Because if every type of software is going to be eaten, either directly absorbed by Codex or Claude for its task, or made unnecessary by a future model, then the best option seems to be: run!
I admit, I'm almost an AI maximalist myself, and I think they're half right. Large model labs are indeed entering large swaths of the application layer. But the "application layer" is not a homogeneous opportunity set. The truly important judgment call is: are you walking the "Yellow Brick Road," or are you in the rest of Oz?
Note: The "Yellow Brick Road" is the main road in *The Wizard of Oz* leading to the heart of the Emerald City, to see the "Wizard."
The "Yellow Brick Road" is our metaphor for the path the large model labs are walking and pouring massive resources into. Problems like code generation, writing, and image creation are inherently suitable for labs because they improve directly with the raw capability of the model: every dollar spent on pre-training and post-training directly enhances product quality.
But in the rest of Oz, there are more complex, often vertical, problems. They aren't simply about giving an enterprise user a horizontal tool connected to standard tools and computer skills. The value here comes more from the scaffolding around the model: the scaffolding that makes the output trustworthy, compliant, and truly actionable within a specific industry's business processes. The raw capability of the underlying model still matters, but it's no longer everything.
We are seeing this happen in real-time. OpenAI and Anthropic are effectively admitting to the market: they cannot solve all problems with one general-purpose AI colleague. They have announced massive investments in frontline deployment-style ventures, building entire companies around configuring and customizing models for enterprise clients. If they truly believed their next model release would solve these problems, they wouldn't be pouring billions into this type of venture.
So, if you want to make money building AI applications, don't walk the Yellow Brick Road. Go build in the rest of Oz. Here are the lessons we, and some founders in our portfolio, have learned in practice.
The Yellow Brick Road
If you're starting a company, the Yellow Brick Road is the most obvious path, but also the most dangerous. Take a high-performance model, connect it to some ready-made connectors like Google Drive, Slack, Salesforce, Notion, GitHub, and add an agent orchestration layer on top. It looks like magic.
The problem is, this is exactly what the large model labs are doing with Cowork and Codex. Obviously, they own the model, which gives them better margins, more control, and pricing power over all downstream participants. But perhaps more importantly, they also hold the architectural choice that determines what problems the product is suitable for solving. So far, they have been very intentional about using a "model + tool calling" pattern, which is precisely the pattern needed for the horizontal, low-step-count tasks on the Yellow Brick Road. Even if a startup could somehow outperform Codex or Claude Code, the labs still have massive distribution and the strongest brand halo in AI.
If you're an AI application company using the exact same playbook – connecting the same connectors, with no underlying sub-agents or configuration, and no distribution – then you are likely on a path to nowhere.
The Rest of Oz
It's not all doom and gloom for startups. Outside the Yellow Brick Road, there are still massive opportunities. Startups can own customers and solve complex problems here.
These companies are building agentic experiences: models woven into complex networks of tools, automation, and integrations – in other words, software. This also makes most of these startups naturally vertical. They can focus on multi-step, multi-stakeholder workflows, design sub-agents for different roles and vertical scenarios, and handle problems that Anthropic and OpenAI's horizontal platforms struggle to reach: gathering context across systems, then routing tasks to multiple people who need to approve at different stages.
This type of work usually involves one or more legacy systems, often requires deterministic results because ambiguity is unacceptable, and sometimes is directly tied to a critical business outcome. The labs know exactly how valuable these problems are: that's why they are building their own outsourced configuration teams, and why an entire industry of Reinforcement Learning service companies for large customers is emerging.
Why the Rest of Oz Won't Be Completely Taken Over by the "Wizard"
One counterargument to the above is that, so far, betting against models or labs continuing to improve has been a terrible trade. They will likely keep getting better and eventually eat the markets these application-layer companies serve.
The labs will certainly continue to improve. But I believe companies in the rest of Oz still have several forms of defense over the long term.
Data and Learning Flywheels
Much of what you truly internalize in a business doesn't exist in any training set: unwritten industry practices, undocumented standards, tribal knowledge living in practitioners' heads. None of it is on the public internet. No amount of training compute can replace actually getting inside the workflows where this knowledge lives.
There are two flywheels stacking here: a cross-customer flywheel, where patterns compound as you see more variants of the same problem; and a within-customer flywheel, where the reasons behind specific decisions, the unspoken exceptions, the company's own rules of thumb, only emerge when users interact with the system.
Even if customer data cannot be used across customers, an application company can still leverage pattern recognition of different customer problem types and use it to guide the architecture design of future problems. A company whose agents have processed a hundred rounds of legal redlining modifications, a thousand insurance underwriting cycles, or ten thousand SDR sales development activities has an understanding of the problem landscape that a later entrant cannot replicate by simply launching a new agent for the first time.
Theoretically, a horizontal agent could build the same learning infrastructure. But the reason it doesn't, besides a lack of focus, is largely user experience. Capturing this knowledge depends entirely on what workflow interfaces you provide to the user. Vertical players can design these interfaces around the information that truly needs to be exposed for a specific workflow; horizontal tools cannot. Evaluation sets, labeled outputs, edge case taxonomies – all can compound into a vertical data flywheel further supporting fine-tuning. A later entrant without equivalent production exposure will struggle to generate this flywheel. Its feasibility depends on data rights, accumulated production usage, and customer contract structures, but the pattern recognition itself will keep compounding.
Managing Model Volatility and Complexity
The labs are already doing internal routing: calling different model classes for different requests, using model ensembles under the hood. But they cannot do cross-vendor routing, evaluate a competitor's model for a specific sub-task, or use the truly best open-source fine-tune for a narrow step.
Companies in the rest of Oz choose the best model for each sub-task across the entire model market, not just models from their parent lab. They also absorb the dirty work no one wants to do: re-running evaluations with each new model release, re-calibrating prompts against customer edge cases, and handling upgrades without breaking production. The labs won't do this for you. They sell you the new model and tell you to migrate. Companies in the rest of Oz absorb the migration cost. The customer gets the best intelligence across the market and continuity through every upgrade cycle.
Cost Optimization
Throwing every query at Opus 4.7 is the fastest way to turn gross margins negative. The best Oz companies route between different tiers of models: hardest tasks to frontier models, most tasks to mid-tier models, and use smaller custom or fine-tuned models where proven feasible.
Some of these companies are now doing their own post-training on top of this, optimizing models for the specific sliver of work the customer truly cares about, and serving it at a fraction of the cost of a frontier API call. The labs charge for a "floor price": the minimum intelligence you can buy for $X. Oz companies sell the opposite: the minimum dollar cost for the level of intelligence actually needed for that specific workflow. This is only possible if you know exactly what level of intelligence each sub-task requires. The labs are structurally incapable of knowing every single task in every single vertical industry. Ultimately, this translates directly into lower, more predictable outcome-based pricing.
Governance
Being the control plane for a customer running AI in a specific vertical creates significant value. This control plane is where permissions, auditing, what an agent is allowed to do, and what it actually did all converge.
This control plane is built on use-case-specific guardrails, which vary dramatically by industry and job function. Because these companies own the tools, workflows, and data the agent touches end-to-end, they can provide deterministic results in ways horizontal tools cannot. They also absorb regulatory complexity for the end buyer: Federal Rules of Civil Procedure and state bar rules in legal, HIPAA in healthcare, SEC and FINRA rules in finance, state-level insurance regulation, and so on. A horizontal player cannot credibly do this without turning itself into a hundred different verticals. A CIO needs a partner who can promise in a contract that it will handle compliance processing for the agents it provides.
All of this comes back to the same thing: focus.
This focus can be a vertical industry, like insurance, law, or accounting; or a function done deeply enough, like sales, customer service, or finance. Either way, this work requires a team to live in the same customer base for a long time, understanding its workflows, edge cases, and regulatory requirements. The labs weren't built for this. They have to serve everyone, everywhere, which is why they built the Yellow Brick Road in the first place. The same trade-offs that make them build that road also make it hard for them to enter the rest of Oz: you can be everywhere, or you can be the best at one thing, but you cannot be both.
Case Study: Sales – Practical Advice from a Technical CEO at 11x
How do you understand this in practice? Here is some practical advice from Prabhav Jain, CEO of 11x.
Focus on Outcomes
A viable tactical path to building a company resilient to the labs' impact is to start with the specific outcome the customer truly cares about. For us, that outcome is helping businesses generate more sales leads and pipeline.
From there, the problem becomes very concrete: which activities do we want to own end-to-end that demonstrably drive sales pipeline growth? Break each activity into tasks. Which tasks are suitable for agents, which are not? Which require complex domain insight, which don't? The labs will also release workflows, but when a workflow has many steps, messy inputs, hard-to-interpret states, or real-world constraints, a better model alone isn't enough to get the job done. At that point, the work returns to traditional software engineering, where the labs have no advantage over a focused application company.
For example, some tasks we handle include: lead generation based on custom signals, lead enrichment, deep account research, grabbing context from CRM, writing messages for different channels, a lead qualification agent, and an email deliverability system. Some are agentic tasks, some are not. These tasks aren't done with a single prompt; they require deep engineering.
The key insight from this Oz analogy is that in any real workflow, roughly half the tasks are non-agentic, and for this half, the labs have no advantage. Underneath the model layer, they are no better at writing deterministic software than you are. And the other half, the agentic tasks, still require you to tune, train, and constrain the model around the desired outcome.
Domain knowledge is often not in general training data. This capability must be built bottom-up from a vertical industry or specific function, and fed to the model at the right moment in the workflow. When our agent qualifies an inbound lead over the phone, it must be trained to understand what constitutes a good sales conversation for that specific industry and user profile. This is the application company's work, and this capability compounds.
More importantly, these capabilities constantly become outdated because businesses themselves evolve. Therefore, your ability to continuously evolve workflows and context becomes a competitive advantage. For instance, when we started building a scalable email outreach product, "AI-written emails" were just starting to appear. Fast forward to today, people have a keen sense for detecting which emails are AI-written versus human-written, and crucially, this judgment shifts every few months. Our agent must constantly adapt to market dynamics, but the moat is built precisely here. In fact, despite this dynamic change, our positive reply rate has increased 4x over the past few months, generating hundreds of millions of dollars in sales pipeline for our customers.
Do High-Complexity Problems
Complex problems are where real business value is unlocked. Otherwise, you'll quickly find yourself building a thin wrapper.
Decomposing any sufficiently complex business problem quickly reveals messiness. Here's a simple-sounding example from the GTM space: if a company is already your customer, you shouldn't contact a contact within that company again. But this is not simple at all.
Maybe your CRM has the company's domain. What about companies with dozens of subsidiaries? What if the CRM records the parent company's domain? What if an outdated matching field in Salesforce causes you to send a cold outreach email to the Chief Revenue Officer of an existing customer? Real-world data is messy. Humans struggle with it, and models don't magically bypass this barrier. Building order from this mess requires designing specialized agents around the specific shape of the problem, not just pointing a general co-pilot at the CRM. In fact, based on our data, we found that our own data quality and freshness is higher than the customer's, so we anchor on our own data by default.
Guardrails Aren't Just to Prevent Bad Things. Customers Pay for This.
Guardrails are significantly underestimated. Even within the same product, each use case requires its own guardrails. For us, a regulated financial services prospect demands entirely different assurances than a mid-market SaaS customer. And these assurances cascade down to how the agent writes, who it can contact, what data it can touch, what it can say on a phone call, and how each decision is logged.
A "one-size-fits-all" system breaks down under this variance. Guardrails must be built by use case, configured per customer, and continuously audited. This work falls entirely on the application company. This is why we need frontline deployment engineers and technical deployment strategists to fine-tune against each customer's requirements.
For example, we worked with a Fortune 1000 institution to perform consented outbound calls to its massive SMB customer base via voice. Initial attempts had low pickup rates. We had to iterate rapidly, learning how to engage this specific audience type within the first 10 seconds of a call. SMB business owners behave differently than large B2B buyers or consumers. Today, we generate more sales opportunities for them in a single day than their entire sales team could create in that segment in a month.
Case Study: Insurance – Practical Advice from the CEO of FurtherAI
Sales is just one example. Insurance is another, illustrating the same principle from a different angle. Here is Aman Gour, CEO of FurtherAI, on "building off the Yellow Brick Road."
When we started deploying AI into real insurance operations, we kept hearing a common hypothesis: the model is the intelligence, and the workflow is just scaffolding built around it.
But the more insurance companies we worked with, the more convinced we became that the opposite is true.
In insurance, a lot of the intelligence *lives within the workflow itself.* Two insurers can run a submission through what looks like the same path: submission, review, quote, underwrite. The path is the easy part. What truly distinguishes one insurer from another is everything inside that path: which risks get escalated, which loss signals matter, which rule takes precedence when two underwriting appetite rules conflict, when a human sign-off is mandatory, what external data needs to be pulled, and how the final decision is logged.
This logic doesn't live in a


