How to Conduct Deep Research Using Claude’s Dynamic Workflows

特邀专栏作者

2026-06-09 03:00

This article is about 5098 words, reading the full article takes about 8 minutes

What truly constitutes deep research in the age of AI, and how to build a complementary collaborative relationship between humans and AI.

AI Summary

Expand

Core Thesis: Claude Code’s Dynamic Workflows, by embedding six structured orchestration patterns (such as routing, parallel processing, adversarial verification, etc.), upgrade the AI research process from "intelligent conversation" to an "automated research framework," effectively addressing core deficiencies in traditional AI-driven research—including goal drift, premature termination, and context pollution. However, completely replacing human deep research still requires continuous improvement in areas such as verification mechanisms, cross-disciplinary thinking, and extreme information condensation.
Key Elements:
1. The core of Dynamic Workflows lies in the AI automatically designing a workflow before executing a task. This workflow includes steps such as problem decomposition, credibility assessment, cross-validation filtering, and goal-oriented output, addressing the deficiencies of traditional skills that lack convergence and decision-making orientation.
2. The six modes cover: Routing (precise task allocation), Split & Merge (parallel acceleration), Adversarial Verification (eliminating self-evaluation bias), Generate & Filter (diversity selection for quality), Tournament (competitive ranking), and Loop (adaptive iteration), enabling complex research orchestration.
3. The Adversarial Verification mode structurally eliminates the AI’s tendency to exhibit "confirmation bias" to please users. It uses independent agents to refute conclusions based on verification, but this must rely on reproducible facts rather than opinions to avoid the verifier biasing the workflow.
4. Compared to the author’s self-developed deep-research system, the official workflow adds problem decomposition, information credibility assessment, vote-based cross-validation filtering, and output consistently anchored to the original objective. This significantly reduces redundant conversation rounds (from over a dozen down to 3-4 rounds).
5. AI still has three major limitations: in cutting-edge fields like blockchain technology, it defaults to relying on lagging official documentation rather than on-chain factual data; its capacity for deep cross-disciplinary thinking is insufficient, and mainstream thinking models struggle with entirely novel topics; solution validation requires balancing cost with mechanism trade-offs, which conflicts with general applicability.
6. Extreme information condensation depends on a precise understanding of the audience’s background. AI cannot automatically switch between "humanized, popular expression" and "concise, professional summaries," which remains a domain irreplaceable by human researchers.

Over the past three years, I have become completely reliant on using AI to assist with industry research. To this end, I have built a series of skills and supporting systems to handle the screening, summarizing, connecting, verifying, and archiving of information.

It wasn't until this week, after deeply experiencing the dynamic workflows of Claude Code, that I truly understood the meaning of the phrase, "Don't fight against the tides of your era."

It's time to reconsider: what constitutes deep research in the AI era, and how should I build a complementary collaborative relationship with AI?

1. Starting from the Pitfalls of Research

Conducting technical research is actually a process fraught with pitfalls (for both humans and AI). After all, from the start of research, you are bombarded with a vast amount of information. The more viewpoints you encounter, the more ambiguous the conclusions become. Therefore, it's crucial to constantly return to the original goal.

This has always been where AI falls short. From the perspective of attention and association, AI is more easily trapped by the immediate volume of information than humans, and it is weak when it comes to truly valuable cross-domain associations.

Of course, where AI excels is its execution power. It can act as an agent, layer by layer, to search, summarize, and synthesize information, completely avoiding the loss of details.

Although I haven't published much on my public WeChat account in the past six months, I have been comprehensively monitoring and researching almost all major battlegrounds in the industry. The input and output of this effort are supported by my own deep-research system.

Now, with the launch of the Dynamic Workflows feature in Claude Code last week, I want to challenge it to see if its default capabilities can completely surpass my own system.

2. What is Dynamic Workflows?

The core idea of Dynamic Workflows is: Before executing a task, the AI automatically designs the workflow to be used for that task, and only then starts the execution.

This is fundamentally different from the "plan mode" and "skills" we used before. Plan mode breaks down tasks into smaller pieces, but doesn't necessarily conform to a reasonable workflow. Acceptance criteria might only be added based on your prompt instructions (which is crucial for research). Similarly, it will only better pre-set some harness rules if you have prompts to guide it.

However, Dynamic Workflows automatically integrates verification logic, result convergence, and adversarial validation.

The trigger method is simple: just use `/deep-research` in CC, then provide some research templates and entry materials. If you want to use the Dynamic Workflows capability independently, you can use a prompt or simply say "ultracode". Be aware that token consumption is roughly dozens of times higher than usual.

3. Six Built-in Workflow Modes

Underlying Dynamic Workflows are six core orchestration modes summarized by the official documentation. This is why it is more powerful than regular conversations, agents, or skills.

In fact, behind these six modes, there are only two core questions: How to decompose the task? How to synthesize the results? The six modes are essentially different permutations and combinations of these two questions.

3.1 Routing Mode (Classify-And-Act)

First, an agent identifies the task type, then distributes the task to the most suitable specialized agent to handle it. The core logic is the routing selection logic, not parallelism or iteration. A task takes only one path, and other paths are not executed at all.

For example, I could first have three preset sub-agent roles: a strict data-verification analysis agent, a well-written output agent, and a challenging agent specifically for finding flaws. The routing layer would decide which agent is best suited for a current sub-task, rather than having one agent handle everything.

The value of this mode lies in its precision and efficiency. Each agent's prompt can be highly independent, free from interference by other goals, allowing for deep, vertical exploration. Token consumption is minimal, response speed is fastest, and role boundaries are very clear.

A significant drawback is that it handles tasks with ambiguous boundaries (e.g., "a task that is both a technical problem and an account problem") poorly.

3.2 Fan-out & Merge

This is also my most commonly used mode. The core logic is parallel execution + merging. The task is divided into N independent sub-tasks that run simultaneously, and after all are completed, the results are merged uniformly.

The advantages are speed and isolation. The total time is roughly equal to the slowest sub-task, not the sum of all sub-tasks. Each sub-task has an independent context, doesn't interfere with others, and noise from one sub-task won't pollute others.

The weakness is that the token cost is N times that of serial processing. The merging layer (Synthesize) itself is also challenging—fusing outputs from N paths with inconsistent structures is a design problem. Poor sub-task division can lead to omissions or duplicate coverage.

3.3 Adversarial Verification

The core logic is verification. For the same conclusion, multiple agents are asked to challenge it from a "refuting" perspective. The conclusion only passes if more than half of the votes are in favor.

The advantage is that since the Verifier doesn't know the Worker's thought process and only sees the results, it structurally eliminates the self-evaluation bias that occurs when "asking a model to check its own code."

This mode solves a long-standing problem for me: we often chat with AI informally, and AI tends to answer in line with your expectations, easily leading to "confirmation bias." Adversarial verification forces the AI to look for counterexamples, to verify based on data and experiments, rather than pandering to your ideas.

However, if the verifier makes a wrong judgment, it can mislead the Worker into catering to the Verifier. Therefore, it's best to base the verification on reproducible facts, not opinions.

Jokingly, if you ask the AI to find problems, it can find an endless number, so you need to limit the boundaries of the problems it looks for.

3.4 Generate & Filter

The core logic is diverge then converge. First, deliberately generate an excessive number of candidates, then use a rubric to eliminate down to the essence, outputting only results with high confidence.

Rather than having one agent output a "decent" answer, it's better to generate ten and then use a verification layer to filter. The advantage lies in diversity. Multiple Generators can use different strategies and prompts to produce solutions that humans would find hard to imagine. The filtering step ensures the final output quality is highly concentrated.

The weakness is that the quality of the Filter's rubric directly determines the final effect. A poorly designed rubric means the entire workflow is invalid.

Suitable scenarios include situations where the correct answer is not known in advance, the need to choose the best from multiple possibilities, and a clear requirement for diversity.

This is only superficially similar to Fan-out & Synthesize: both involve "multiple parallel paths → single output," which is why they are most easily confused.

The key difference lies in intent: in Fan-out, each path handles a different part of the task, the results are complementary, and all paths contribute to the merge. In Generate & Filter, each path handles the same task, the results are competitive, and most are discarded during merging. The former is like a "jigsaw puzzle," the latter is like a "beauty contest."

3.5 Tournament Mode

The core logic is competitive elimination. N agents each independently perform the same task. Through pairwise comparison in successive rounds, the best solution is ultimately selected.

I have done this manually before—running two or three versions of the same code change and having AI compare which is better. Now it can be integrated directly into the workflow.

The advantage lies in judgment stability. Pairwise comparisons ("Is A or B better?") are much more stable than absolute scoring ("Rate A from 1 to 10") because it eliminates the problem of rating scale drift. After multiple rounds of competition, the credibility of the final winner is high.

This is also superficially similar to Generate & Filter: both are about selecting the best from multiple candidates. The key difference is the selection mechanism: Tournament uses a pairwise judge for head-to-head comparisons, which is a "let the candidates compete" approach. It is more reliable when the rubric is difficult to quantify and judgment is inherently relative.

3.6 Loop Mode

The core logic is adaptive iteration: continuously try, on encountering resistance, gather error information, supplement the context, and try again until the acceptance criteria are met.

Essentially, it's about combating the randomness of AI: try several times, and you'll eventually stumble upon a better result. A more mature approach, however, combines this with adversarial verification, so each loop carries more information into the execution rather than relying purely on randomness.

The advantage is its ability to handle tasks with unknown workloads. The other five modes assume the task boundaries are fixed. "Loop Until Done" is the only mode capable of handling situations where "you don't know how many rounds are needed."

The weakness is the potential risk of losing control—a poorly designed stop condition can lead to an infinite loop. Each round's agent has a completely new context and cannot accumulate state across rounds (unless explicitly written to a file).

4. Battle: My Personal Skill vs. the Official Workflow

Before Dynamic Workflows came out, I had specifically designed my own deep-research system. The logic of my skill was roughly as follows:

Provide only a simple piece of information (e.g., a new feature launched by a project).
Have the AI search for all related materials: official documentation, source code, market sentiment.
Compress the information into meaningful summaries.
Have multiple agent roles perform adversarial analysis and generate a report.
Automatically deduplicate, as content overlap among multiple agents is high.

I used it for a while and found it quite useful. But it has a fundamental flaw: it lacks goal-oriented convergence.

Moreover, even with the deduplication step, it often deletes valuable information. Without deduplication, the skill would easily produce a document of over ten thousand words—comprehensive but failing to directly tell you "why this matters to you and what you should do."

But research serves "decision-making." This is why many skills stop at just the research itself; they achieve 80% but miss the crucial final 20%.

Consequently, after the AI completes the initial research, I still need to engage in ten more rounds of thinking and conversation to reach a satisfactory and thorough conclusion.

What Does the Official Dynamic Workflow Add?

Through several complex research task experiments this week, I found that Claude Code's built-in deep research workflow (note: not just a skill, but a module compiled and embedded into CC), compared to my personal skill, adds several key steps:

Problem Decomposition Layer: It doesn't start searching immediately. Instead, it first asks questions to break down my question into sub-questions: What exactly do you want to clarify? How does this relate to you? What dimensions are worth exploring in depth? I used to skip this step.
Credibility Assessment: It evaluates the falsifiability of each piece of information, similar to authority scoring in traditional SEO—is the source credible? How often is it cited? This is an aspect I hadn't thought to add before.
Cross-Elimination Instead of Average Merging: My old method averaged all conclusions, resulting in a large document. The Dynamic Workflow conducts a multi-agent vote on each conclusion, deleting those with insufficient votes, rather than simply merging them.
Goal-Oriented Output: The final report is not a pile of information but provides judgments and recommended actions centered around your original goal. The key is its pre-set ability to orchestrate multiple sub-agents. The reason my skill lacked this ultimate goal orientation is due to instruction weight decay after processing massive amounts of information.

What Problems Do These Mechanisms Solve?

They target several typical issues with AI performing long tasks:

Goal Drift: The state is good at the start of the task, but by the middle, it forgets what it's doing, only to find its rhythm again at the end—similar to a human zoning out in class. The longer the task, the more pronounced this is.

Premature Stopping: When encountering difficulties during execution, the AI thinks it's "done" and stops, but it hasn't actually met the acceptance criteria.

Context Pollution: For a single agent doing a complex task, the large amount of preceding prompt content can compress the space for subsequent execution. A better way is to keep the preceding prompt within a few thousand tokens and use multiple agents to share the context load.

Output Bias: The AI tends to answer in line with your expectations, and casual questioning is more likely to trigger this problem.

Dynamic Workflows structurally solve these four problems: automatically adding acceptance criteria prevents premature stopping; parallelization isolates context; adversarial verification cancels out output bias; decomposition and layered constraints force the AI to understand the goal first before acting.

5. Summary

Finally, as a longtime researcher, I am amazed by this new mechanism in CC. The six built-in modes—Routing, Fan-out & Merge, Adversarial Verification, Generate & Filter, Tournament, and Loop—cover the orchestration needs of most complex research tasks.

I no longer need to manually design agent orchestration or perform deduplication and cross-validation myself. These are all built into the workflow itself.

Moreover, it is particularly suitable for thinking about exploratory, open-ended questions where information is lacking. Because of its inherent multi-agent orchestration and task goal decomposition, it has once again improved in generality. Actually, three years ago, AI was already very good at solving very clearly defined small problems under strict constraints. But the true qualitative change in AI lies in this generality. Its competitor is no longer simple code generation; it is truly becoming an Agent, evolving from solving one rigid problem to adapting to any problem.

So, Dynamic Workflows is not about a "smarter single conversation." It's about structuring the research process itself.

What originally required me to initiate over a dozen separate conversations for research is now compressed to 3-4 times. Although the token consumption has increased dozens of times.

So why are 3-4 times still needed? I believe the root cause lies in the differences in these requirements.

First is the stringency of the verification mechanism. My research primarily focuses on new technologies on the blockchain. For many things, official documentation is outdated. More reliable reference points include open-source code, on-chain transactions, and other data. Currently, the AI defaults to official documentation as the primary source, rather than fact-based verification.

Second is deep, completely cross-domain thinking. While this can be partially addressed by pre-setting the workflow (e.g., predefining various dimensional sub-agents to think about the same problem), AI is still

Welcome to Join Odaily Official Community