OpenAI is eating the application layer? a16z says the real opportunity lies beyond general-purpose models.
- Core Thesis: The real opportunity for AI application startups isn't competing with large model companies for horizontal general-purpose tools (the "Yellow Brick Road"), but rather in deeply embedding into industry workflows to build vertical solutions that rely on complex workflows, data accumulation, governance, compliance, and system integration capabilities.
- Key Elements:
- Large model labs (like OpenAI, Anthropic) are capturing the "Yellow Brick Road" through model capabilities plus horizontal tools (e.g., code generation), but they cannot penetrate the multi-step, multi-party vertical scenarios that require deep domain specificity.
- A startup's moat comes from the data and learning flywheel accumulated around a specific industry, including unwritten industry norms and tribal knowledge that are not present in public training datasets.
- Application-layer companies can optimize costs and manage complexity by routing across multiple vendors, selecting the best model for specific subtasks, and absorbing the migration costs of model upgrades.
- Providing a "control plane" for governance, compliance, and auditing based on specific use cases, such as handling HIPAA or FINRA rules, is a core value that horizontal models find difficult to replicate.
- Vertical companies offer "system-type" products that replace human labor and lock in workflows. Customers pay based on business outcomes (e.g., sales leads) rather than benchmarks, resulting in high customer lifetime value.
Original title: Avoiding Death on the Yellow Brick Road
Original author: Joe Schmidt IV, a16z
Translation and compilation: Peggy
Editor's note: With the continuous improvement of large model capabilities, a common anxiety is emerging in the AI application layer: If model companies like OpenAI and Anthropic not only control the underlying models but also possess distribution channels and brand advantages, what can startups still do in the application layer?
This is precisely the question a16z partner Joe Schmidt attempts to answer in this article. Borrowing the metaphor of the "Yellow Brick Road" from "The Wizard of Oz," he divides AI application opportunities into two categories: one is the main road that large model companies are currently traversing, such as code generation, writing, image generation, general-purpose agents, and horizontal office assistants. The other is "the rest of Oz," which encompasses vertical scenarios deeply embedded in industry processes, relying on complex workflows, accumulated data, compliance governance, and system integration capabilities.
In his view, the real opportunity for startups lies in the latter.
From sales to insurance, Joe Schmidt repeatedly emphasizes the same logic: What enterprises are truly willing to pay for is not a smarter chat window, but a system that can take responsibility for business outcomes. It needs to understand the messy state of customer data, handle multi-person approvals and edge cases, assume compliance and audit responsibilities, and complete migration, routing, and cost optimization for clients as models continuously upgrade.
This is also the core judgment of this article for the next generation of enterprise software: Underlying models will become increasingly powerful, but also increasingly replaceable; what is truly irreplaceable is the data, processes, governance capabilities, and operational memory accumulated around specific industries and specific workflows. The opportunity for AI application companies lies not in competing with model companies for the "Yellow Brick Road," but in entering those more complex, messier, slower, but also closer-to-real-business-value places.
The following is the original text:
Lately, I keep hearing the same question from founders and potential employees: Is there anything left to do in the AI application layer? Or will OpenAI and Anthropic eventually kill everything?
There’s a kind of quintessential AI anxiety behind this question. Some have concluded that if you don't want to be a permanent underlayer, the only long-term valuable positions are either inside a large model lab, or in robotics, hard tech, or similar frontier fields—theoretically, doing things the labs can't touch. Because if every category of software will be eaten, either by Codex or Claude directly absorbing the corresponding work, or by some future model making it unnecessary, the best choice seems to be: run!
I admit, I’m almost an AI maximalist myself, and I think they're half right. Large model labs are indeed entering large swaths of the application layer. But the "application layer" is not a homogeneous set of opportunities. The truly important criterion is: Are you walking the "Yellow Brick Road," or are you in the rest of Oz?
Note: The "Yellow Brick Road" is the main road in "The Wizard of Oz" leading to the heart of the Emerald City in Oz, to see the "Wizard."
The "Yellow Brick Road" is how we describe the path the large model labs are taking and devoting enormous resources to. Problems like code generation, writing, and image creation are naturally suited for the labs because they get better with improvements in the raw capabilities of the models: every dollar invested in pre-training and post-training directly improves product quality.
But the rest of Oz contains more complex, often more vertical problems. They aren't simply about giving an enterprise user a horizontal tool that connects to standard tools and computer operation capabilities. The value here comes more from the scaffolding around the model: the scaffolding that makes the output credible, compliant, and truly enterable into business processes for a specific industry. The raw capability of the underlying model is still important, but it's no longer everything.
We are seeing this happen in real-time. OpenAI and Anthropic are effectively admitting to the market that they cannot solve all problems with a single, general-purpose AI colleague. They have announced massive investments in front-line deployment-style joint ventures, building entire companies around configuring and customizing models for enterprises. If they truly believed the next model release would solve these problems, they wouldn't be pouring billions into such projects.
So, if you want to make money building AI applications, don't walk the Yellow Brick Road. Go build in the rest of Oz. Here's what we, and some founders in our portfolio, have learned in practice.
The Yellow Brick Road
If you're starting a company, the Yellow Brick Road is the most obvious path, but also the most dangerous one. Take a high-performance model, connect it to some ready-made connectors like Google Drive, Slack, Salesforce, Notion, GitHub, and put an agent orchestration layer on top. It looks like magic.
The problem is that this is exactly what the large model labs are doing with Cowork and Codex. Obviously, they own the models, which means they have better margins, more control, and can exert pricing power over all downstream participants. But perhaps more importantly, they also control the architectural choices that determine what problems the product is suitable for solving. So far, they have very deliberately adopted the "model + tool calling" pattern, which is precisely the pattern needed for those horizontal, low-step-count tasks on the Yellow Brick Road. Even if a startup could somehow surpass Codex or Claude Code, the large model labs still possess massive distribution capabilities and the strongest brand halo in the AI field.
If you are an AI application company using the same playbook—connecting the same connectors, without lower-level sub-agents or configurations, and without a distribution channel—you are likely walking a path to nowhere.
The Rest of Oz
The situation isn't entirely bleak for startups. Beyond the Yellow Brick Road, there are still huge opportunities. Startups can own customers and solve complex problems in these places.
These companies are building agentic experiences: models are woven into complex networks of tools, automation, and integration—in other words, software. This also makes most of these startups inherently vertical. They can focus on multi-step, multi-party workflows, designing sub-agents for different roles and vertical scenarios, handling problems that Anthropic's and OpenAI's horizontal platforms struggle to reach: gathering context across systems, then routing tasks to multiple people who need approval at different stages.
This type of work usually involves one or more legacy systems, often requires deterministic results because ambiguity is unacceptable, and sometimes is directly tied to a significant business outcome. The large model labs know how valuable these problems are: that's why they are building their own outsourced configuration teams, and why a whole ecosystem of large-customer-focused reinforcement learning service companies is emerging.
Why the Rest of Oz Won't Be Completely Overrun by the "Wizard"
A counterargument to the above is that so far, betting that the models or labs won't continue to improve has been a terrible trade. They are likely to keep getting stronger and eventually eat the markets these application-layer companies serve.
The large model labs will certainly continue to improve. But I think companies in the rest of Oz still have a few defensive moats in the long run.
Data and Learning Flywheels
Much of what you truly internalize in a business doesn't exist in any training set: unwritten industry customs, undocumented standards, tribal knowledge that lives in practitioners' heads. None of it is on the public internet. No amount of training compute can replace actually being inside the workflows where this knowledge resides.
Two flywheels compound here: one is the cross-customer flywheel, where seeing more variants of the same problem type continuously builds patterns; the other is the intra-customer flywheel, where the reasoning behind a specific decision, the unspoken exceptions, and the company's own rules of thumb only emerge when users interact with the system for real.
Even if customer data can't be used across customers, an application company can still utilize pattern recognition of different customer problem types to guide the architecture for future problems. If a company has already had its agent process a hundred contract redline revisions, a thousand rounds of insurance underwriting cycles, or ten thousand SDR sales development activities, its understanding of the problem morphology is something a later entrant running a new agent for the first time cannot replicate.
Theoretically, a horizontal agent could build the same learning infrastructure. But the reason it doesn't, besides a lack of focus, is user experience. Capturing this knowledge depends entirely on the workflow interface you provide to the user. Vertical players can design these interfaces around the information that truly needs to be surfaced for a specific workflow; horizontal tools cannot. Evaluation sets, annotated outputs, taxonomies of edge cases—all can compound into a vertical data flywheel and further support fine-tuning. A later entrant without a comparable scale of production exposure will find it hard to generate this flywheel. Its feasibility depends on data rights, accumulated production usage, and customer contract structures, but the pattern recognition itself will continue to accumulate.
Managing Model Volatility and Complexity
Inside large model labs, routing already happens: calling different categories of models for different requests, using model ensembles underneath. But what they can't do is route across vendors, evaluate a competitor's model for a specific sub-task, or use a truly optimal open-source fine-tuned model for a narrow step.
Companies in the rest of Oz will choose the best model for each sub-task from the entire model marketplace, not just the models released by a single lab. They also absorb the work nobody wants to do: re-running evaluations with each new model release, recalibrating prompts for customer edge cases, and handling upgrades without breaking production environments. Large model labs won't do this for customers. They sell you the new model and tell you to migrate. Companies in the rest of Oz absorb the migration cost. The customer gets the best intelligence available in the entire market, plus continuity through each upgrade cycle.
Cost Optimization
Throwing every query at Opus 4.7 is the fastest way to turn gross margin negative. The best Oz companies route between tiers of models: frontier models for the hardest tasks, mid-tier models for most tasks, and smaller, custom, or fine-tuned models where proven viable.
Some of these companies are now doing their own post-training on top of this, optimizing models for that narrow sliver of work the customer truly cares about, and serving it at a fraction of the cost of a frontier API call. Large model labs price for a "floor": the minimum intelligence level you can buy for X dollars. Oz companies sell the opposite: achieving the lowest dollar cost for the specific level of intelligence actually needed for a given workflow. This is only possible when you know exactly what level of intelligence each sub-task requires. And large model labs are structurally incapable of knowing every task in every vertical industry. Ultimately, this translates directly into lower, more predictable outcome-based pricing.
Governance
Becoming the control plane for a customer's AI operations in a specific vertical generates considerable value. This control plane is where permissions, audits, what the agent is allowed to do, and what it actually did come together.
This control plane is built on guardrails specific to the use case, and these guardrails are entirely different across industries and job types. Because these companies own the tools, workflows, and data the agent touches end-to-end, they can provide deterministic results in ways that horizontal tools can't. They also absorb regulatory complexity for the end buyer: FRCP and state bar rules in law, HIPAA in healthcare, SEC and FINRA rules in finance, state-level insurance regulation, and so on. A horizontal player cannot convincingly do this without transforming itself into a hundred different verticals. What CIOs need is a partner that can explicitly commit in a contract that it will handle the compliance processing for the agents it provides.
All of this ultimately comes back to the same thing: focus.
This focus can be a vertical industry, like insurance, law, or accounting; or it can be a function done deeply enough, like sales, customer service, or finance. Either way, the work requires a team to stay embedded with one type of customer base for a long time, understanding its workflows, edge cases, and regulatory requirements. Large model labs are not built for this. They have to serve everyone, be everywhere, which is why they built the Yellow Brick Road in the first place. The same trade-off that makes them build that road makes it hard for them to enter the rest of Oz: You can be everywhere, or you can be exceptional at one thing, but you can't be both.
Using Sales as an Example: Practical Advice from the CEO of 11x
What does this look like in practice? Here are some practical insights from Prabhav Jain, CEO of 11x.
Focus on Outcomes
A viable tactical path to building a company resilient to large model labs is to start with the specific outcome the customer truly cares about. For us, that outcome is helping businesses generate more sales leads and sales pipeline.
From there, the problem becomes very concrete: Which activities do we want to own end-to-end that demonstrably drive sales pipeline growth? Break down each activity into tasks. Which ones are suitable for agents, and which aren't? Which ones require complex domain insight, and which don't? Large model labs will also release workflows, but when a workflow has many steps, messy inputs, hard-to-explain state, or real-world constraints, having a better model alone doesn't get the job done. At that point, it comes back to traditional software engineering, where a large model lab has no advantage over a focused application company.
For example, some of the tasks we handle include: lead generation based on custom signals, lead enrichment, deep account research, pulling context from CRM, writing messages personalized for different channels, lead qualification agents, and email deliverability systems. Some of these are agentic tasks, some are not. These tasks cannot be completed with a single prompt; they require deep engineering capability.
The key insight from the Oz analogy is that roughly half of any real workflow consists of non-agentic tasks, and that half brings no advantage to the lab. Beneath the model layer, they are no better at writing deterministic software than you are. The other half of agentic tasks still requires you to tune, train, and constrain the model around the specific outcome you want.
Domain knowledge is often not in the general training data. These capabilities must be built bottom-up from a vertical industry or specific function and fed to the model at the right moments in the workflow. When our agent judges whether an inbound lead is qualified over the phone, it must be trained to understand what constitutes a good sales conversation for a specific industry and persona. This is work for the application company, and this capability compounds.
More importantly, these capabilities constantly become obsolete because businesses themselves evolve. Therefore, your ability to continuously evolve workflows and context becomes a competitive advantage in itself. For example, when we started building a scaled email outreach product, "AI-written emails" were just emerging. Fast forward to today, people have developed a keen sense for distinguishing AI-written emails from human-written ones, and the key is that this judgment changes every few months. Our agent must continually adapt to market dynamics, but this is precisely where the moat is built. In fact, despite this dynamism, our positive reply rate has increased 4x over the past few months, generating hundreds of millions of dollars in sales pipeline for our customers.
Tackle High-Complexity Problems
Complex problems are where real business value is unlocked. Otherwise, you easily find yourself just building a thin wrapper layer.
Decomposing any sufficiently complex business problem quickly reveals messiness. Here's a seemingly simple example from the GTM space: If a company is already your customer, you shouldn't contact a specific contact person within that company. But this is not simple at all.
Maybe your CRM has the domain for that company. What about companies with dozens of subsidiaries? What if the CRM records the parent company's domain? What if an outdated matching field in Salesforce causes you to send a cold sales outreach email to the Chief Revenue Officer of an existing customer? Real-world data is messy. Humans struggle with it, and models don't magically bypass this hurdle. Building order from this messiness requires designing specialized agents around the specific shape of the problem, not just pointing a general-purpose copilot at the CRM. In fact, based on our data, we've found that our own data quality and freshness are higher than that of our customers, so we anchor on our own data by default.
Guardrails Aren't Just About Preventing Bad Things. Customers Pay for This.
Guardrails are severely underestimated. Even within the same product, each use case needs its own guardrails. For us, a regulated financial services prospect requires entirely different assurances than a mid-market SaaS customer. And these assurances cascade down to how the agent writes, whom it can contact, what data it can touch, what it can say on the phone, and how every decision is logged.
A "one-size-fits-all" system breaks down in the face of such differences. Guardrails must be built per use case, configured per customer, and continuously audited—and this work falls entirely on the application company. This is why we need forward-deployed engineers and technical deployment strategists to tune for each customer's requirements.
For example, we worked with a Fortune 1000 institution to conduct consented outbound calls to its massive SMB customer base via voice. In the first few attempts, pickup rates were very low. We had to iterate quickly, learning how to engage this specific audience within the first 10 seconds of a call. SMB business owners behave very differently from large B2B buyers or consumers. Today, we generate more sales opportunities for them in a single day than their entire sales team could generate in that segment in a month.
Using Insurance as an Example: Practical Advice from the CEO of FurtherAI
Sales is just one example. Insurance is another, illustrating the same point from a different angle. Here is Aman Gour, CEO of FurtherAI, on what it means to "build off the Yellow Brick Road."
When we started deploying AI into real insurance operations, we repeatedly heard an assumption: The model is the intelligence, and workflows are just scaffolding built around the model.
But the more insurers we work with, the more convinced we are that the opposite is true.
In insurance, a lot of intelligence resides within the workflow itself. Two insurers can route a submission through an identical-looking path: submission, review, quote,


