DWF Report: AI Outperforms Humans in DeFi Yield Optimization, but Autonomous Trading Lags Behind by 5x

特邀专栏作者

2026-04-17 11:00

This article is about 3482 words, reading the full article takes about 5 minutes

Agent activity will only continue to accelerate, and the infrastructure being laid today will determine how the next phase of on-chain finance operates.

AI Summary

Expand

Core Viewpoint: AI Agent automated activity already accounts for nearly 20% of DeFi activity, outperforming humans in rule-based scenarios like yield optimization. However, in trading domains requiring complex decision-making, top human traders still significantly outperform top AI Agents.
Key Elements:
1. Activity Scale: Automated/Agent activity is estimated to account for over 19% of all on-chain activity, with more than 76% of stablecoin transfer volume generated by bots.
2. Yield Optimization Advantage: In scenarios like liquidity provision, such as Giza Tech's ARMA application, it can generate over 9.75% APY for USDC, outperforming ordinary lending protocols.
3. Trading Domain Lag: In human vs. Agent trading competitions, top human performers outperformed top Agents by more than 5x, indicating human dominance in complex trading.
4. Model Performance Variance: Agent trading competitions show that model selection (e.g., Grok 4.20 performed well), risk management (e.g., holding time, leverage control), and prompting strategies are key factors affecting performance.
5. Infrastructure Challenges: Achieving full autonomy (self-funding, execution) still requires breakthroughs, and risks such as strategy crowding, privacy-transparency trade-offs, and Sybil attacks exist.
6. Lack of Evaluation Framework: There is currently a lack of comprehensive Agent evaluation standards. Focus is needed on their performance under different market conditions, data source credibility, and security architecture.
7. Future Development Direction: Standards like ERC-8004 aim to establish on-chain reputation and collaboration, but building trustworthy infrastructure will be key for Agent mass adoption and market share acquisition.

Original Author: DWF Ventures

Original Compilation: TechFlow

Guide: AI Agents already account for nearly one-fifth of DeFi trading volume and indeed outperform humans in rule-defined scenarios like yield optimization. However, when it comes to autonomous trading, top-performing AI agents achieve less than one-fifth the performance of top human traders. This research breaks down the real-world performance of AI in different DeFi scenarios, a must-read for anyone interested in automated trading.

Key Takeaways

Automation and agent activity currently account for approximately 19% of all on-chain activity, but true end-to-end autonomy has not yet been achieved.

In narrow, well-defined use cases like yield optimization, agents have demonstrated performance superior to both humans and bots. However, for multifaceted actions like trading, human performance remains superior to agents.

Among agents themselves, model selection and risk management have the greatest impact on trading performance.

As agents are adopted at scale, several risks concerning trust and execution exist, including Sybil attacks, strategy crowding, and privacy trade-offs.

Sustained Growth in Agent Activity

Agent activity has grown steadily over the past year, with both trading volume and transaction counts increasing. We've seen significant developments led by Coinbase's x402 protocol, with players like Visa, Stripe, and Google also joining in to launch their own standards. Most of the infrastructure being built today aims to serve two types of scenarios: channels between agents or agent invocations triggered by humans.

While stablecoin transactions are widely supported, the current infrastructure still relies on traditional payment gateways as the underlying layer, meaning it remains dependent on centralized counterparties. Therefore, the "full autonomy" endgame where agents can self-fund, self-execute, and continuously optimize based on changing conditions has not yet been realized.

Agents are not entirely new to DeFi. For years, automation has existed in on-chain protocols through bots, capturing MEV or achieving excess returns not possible without code. These systems perform very well under clearly defined parameters that do not change frequently or require additional supervision. However, markets have grown more complex over time. This is where we see the new generation of agents entering, with the on-chain world becoming an experimental ground for such activity over the past few months.

The Actual Performance of Agents

According to reports, agent activity is growing exponentially, with over 17,000 agents launched since 2025. The total volume of automation/agent activity is estimated to cover over 19% of all on-chain activity. This is not surprising, as it is estimated that over 76% of stablecoin transfer volume is generated by bots. This indicates significant room for growth in agent activity within DeFi.

There is a wide spectrum of agent autonomy, ranging from chatbot-like experiences requiring high human supervision to agents that can formulate strategies adapted to market conditions based on goal inputs. Compared to bots, agents have several key advantages, including the ability to respond to and execute on new information within milliseconds, and the capacity to extend coverage to thousands of markets while maintaining the same level of rigor.

Currently, most agents are still at the analyst-to-copilot level, as the majority remain in testing phases.

Yield Optimization: Agents Excel

Liquidity provision is an area where automation already occurs frequently, with agents holding a total TVL exceeding $39 million. This figure primarily measures assets deposited directly into agents by users but does not include capital routed through vaults.

Giza Tech is one of the largest protocols in this space, launching its first agent application, ARMA, at the end of last year, designed to enhance yield capture for major DeFi protocols. It has attracted over $19 million in assets under management and generated over $4 billion in agent trading volume. The high ratio of trading volume to total AUM indicates that agents frequently rebalance capital, enabling higher yield capture. Once capital is deposited into the contract, execution is automated, providing users with a simple one-click experience requiring almost no supervision.

ARMA's performance is measurably excellent, generating over 9.75% APY for USDC. Even after accounting for additional rebalancing fees and the agent's 10% performance fee, the yield still exceeds that of ordinary lending on Aave or Morpho. Nonetheless, scalability remains a key issue, as these agents have not yet been battle-tested to manage or scale to the size of major DeFi protocols.

Trading: Humans Hold a Significant Lead

However, for more complex actions like trading, the results are much more varied. Current trading models operate based on human-defined inputs and provide outputs according to preset rules. Machine learning extends this by enabling models to update their behavior based on new information without explicit reprogramming, advancing them into a copilot role. As fully autonomous agents join the fray, the trading landscape will change dramatically.

Several trading competitions have been held between agents and between humans and agents, revealing significant variance among models. Trade XYZ hosted a human vs. agent trading competition for stocks listed on its platform. Each account started with $10,000, with no restrictions on leverage or trading frequency. The results were overwhelmingly in favor of humans, with top human performers outperforming top agents by over 5 times.

Meanwhile, Nof1 hosted an agent trading competition among models, pitting several models (Grok-4, GPT-5, Deepseek, Kimi, Qwen3, Claude, Gemini) against each other, testing different risk configurations from capital preservation to maximum leverage. The results revealed several factors that can help explain performance differences:

Holding Time: A strong correlation exists; models holding positions for an average of 2-3 hours significantly outperformed those flipping frequently.

Expected Value: This measures whether a model makes money on average per trade. Interestingly, only the top 3 models had a positive expected value, meaning most models lost on more trades than they won.

Leverage: Lower average leverage levels of 6-8x proved to perform better than models running over 10x leverage, as high levels accelerated losses.

Prompt Strategy: Monk Mode was by far the best-performing model, while Situational Awareness performed the worst. Based on model characteristics, it shows that focusing on risk management and fewer external sources leads to better performance.

Base Model: Grok 4.20 significantly outperformed other models by over 22% across different prompt strategies and was the only model profitable on average.

Other factors like long/short preference, trade size, and confidence scores did not have sufficient data or were proven to have any positive correlation with model performance. Overall, the results indicate that agents tend to perform better within clearly defined constraints, meaning human input remains highly necessary for target configuration.

How to Evaluate Agents

Given that agents are still in their early stages, there is currently no comprehensive evaluation framework. Historical performance is often used as a benchmark for evaluating agents, but they are influenced by underlying factors that provide stronger indicators of robust agent performance.

Performance Across Different Volatility Regimes: Includes disciplined loss control when conditions deteriorate, indicating the agent's ability to identify off-chain factors that could affect trade profitability.

Transparency vs. Privacy: Both sides have their trade-offs. Transparent agents essentially have no strategic advantage if their trades can be actively copied. Private agents face the risk of insider extraction by creators who can easily front-run their own users.

Information Sources: The data sources an agent accesses are crucial for determining how it makes decisions. Ensuring sources are trustworthy and without single dependencies is vital.

Security: It is important to have smart contract audits and proper fund custody architecture to ensure fallback measures in black swan events.

The Next Steps for Agents

For agents to be adopted at scale, significant work remains on the infrastructure side. This boils down to key issues surrounding agent trust and execution. Autonomous agents operate without guardrails, and instances of poor fund management have already emerged.

ERC-8004 went live in January 2026, becoming the first on-chain registry enabling autonomous agents to discover each other, establish verifiable reputations, and collaborate securely. This is a key unlock for DeFi composability, as trust scores are embedded within the smart contracts themselves, allowing for permissionless activity between agents and protocols. This does not guarantee that agents will always operate non-maliciously, as security vulnerabilities like colluding on reputation and Sybil attacks can still occur. Therefore, there remains significant room to fill in areas like insurance, security, and agent economic staking.

As agent activity expands within DeFi, strategy crowding becomes a structural risk. Yield farming is the clearest precedent, where returns compress as strategies become popular. The same dynamic could apply to agent trading. If a large number of agents are trained on similar data and optimize for similar objectives, they will converge on similar positions and similar exit signals.

The CoinAlg paper published by Cornell University in January 2026 formalized one version of this problem. Transparent agents can be arbitraged because their trades are predictable and can be front-run. Private agents avoid this risk but introduce a different one: creators retain an information advantage over their own users and can extract value through the very opacity meant to protect them.

Agent activity will only continue to accelerate, and the infrastructure laid today will determine how the next phase of on-chain finance operates. As agent usage increases, they will self-iterate and become more adept at adapting to user preferences. Therefore, the main differentiator will come down to infrastructure that can be trusted, and those will capture the largest market share.

DeFi

Welcome to Join Odaily Official Community