How Can AI Empower Smart Contract Security? Practical Sharing from General Models to a Triple-Audit Model

星球君的朋友们

Odaily资深作者

2026-04-14 16:26

This article is about 6255 words, reading the full article takes about 9 minutes

Building a comprehensive security assurance system for Web3 projects.

AI Summary

Expand

Core Viewpoint: Currently, AI has clear capability boundaries in smart contract auditing. It excels at scanning for known vulnerability patterns but struggles with deep-seated vulnerabilities that depend on cross-contract interactions and complex business logic. Therefore, Beosin has established a triple complementary audit model: "Skill-enhanced AI baseline checks + manual deep auditing + formal verification."
Key Elements:
1. When auditing standard token contracts, general AI models can identify code specification issues. However, due to a lack of business context, they often misclassify intended designs, such as the owner's minting authority in USDT-like contracts, as high-risk vulnerabilities.
2. When auditing complex DeFi protocols (e.g., IPC Protocol), AI has low coverage and a high false positive rate for deep-seated vulnerabilities that require understanding cross-component state paths (e.g., signature replay, specific state reentrancy).
3. By building a proprietary Skill knowledge base and injecting structured expert audit experience into the AI, Beosin has increased the high-risk vulnerability coverage rate for complex contracts from 11% to 44% in tests, while reducing the false positive rate from 55% to approximately 30%.
4. AI baseline checks can be combined with project whitepapers for consistency verification, effectively reducing false positives caused by unclear design intent and discovering deviations between code implementation and documented promises.
5. Manual auditing is responsible for protocol-level deep understanding and identifying novel attack vectors, while formal verification provides mathematical certainty for critical business logic. The synergy of these three components builds a complete security system.

Original Source: Beosin

In recent years, large language models (LLMs) like GPT-4, Claude, and Gemini have demonstrated strong code comprehension capabilities. They can effectively read smart contract languages such as Solidity, Rust, and Go, and identify classic vulnerabilities with distinct code patterns like reentrancy attacks and integer overflows. This has led the industry to ponder: Can LLMs assist or even replace manual contract auditing?

However, general-purpose models often lack deep understanding of specific project business logic. When auditing complex DeFi protocols, they tend to have high false positive rates and are prone to missing vulnerabilities that require understanding cross-contract interactions or economic models. Subsequently, the industry proposed incorporating a "Skill" mechanism—injecting specialized knowledge bases, detection rules, and business context related to smart contract security into the foundation of general LLMs. This provides the model with clearer judgment criteria during audits, moving beyond reliance on general capabilities alone to determine if code is problematic.

Even with Skill enhancement, AI auditing still has clear limitations. It excels at scanning for known vulnerability patterns and checking code standards, but it currently struggles to effectively handle complex vulnerabilities that require deep understanding of overall protocol design, cross-contract interaction logic, or economic models. Such issues still require experienced audit experts. For scenarios involving complex computational logic, formal verification is needed to provide stronger guarantees. Against this backdrop, Beosin has established a three-audit model: Skill-enhanced AI baseline checks + manual deep auditing + formal verification, each with its own focus and complementing the others.

1. The Boundaries of General AI Model Auditing Capabilities: Controlled Comparative Testing and Case Analysis

This article selects two types of contracts with significantly different complexity levels from our completed manual audit project library as test cases. The first type is simple contracts with relatively independent logic and clear functional boundaries. These projects typically represent the scenario where AI audit tools have the most abundant training data and, theoretically, the greatest advantage. The second type involves complex contracts with multi-contract interactions, complex state machines, or cross-protocol dependencies. This is precisely the high-risk scenario most frequently cited in industry discussions about "whether AI can replace manual auditing."

For comparison, we used the exact same codebase. First, we ran an independent AI audit, generating a report. Then, we meticulously aligned this report line-by-line with the manual audit report. The two report generation processes were completely independent—the manual auditors had no knowledge of the AI's results to avoid mutual influence. Finally, we analyzed the results from the following four dimensions:

Case A · Standard Token Contract (BSC-USDT / BEP20USDT.sol)

For the first test, we selected a standard BEP-20 token contract written in Solidity 0.5.16. Its logic is relatively independent with clear functional boundaries, involving no cross-contract interactions. The primary security risks are concentrated in common, known vulnerability patterns. This type of contract is theoretically the most advantageous scenario for AI auditing currently—training data contains many such standard token contracts, and rule-based vulnerability features are relatively obvious.

The AI output 6 alerts (2 High, 1 Medium, 3 Low/Suggestion), which is a considerable number from a quantity perspective. The Low and Suggestion items were mostly accurate, covering common code standard issues like outdated Solidity versions and state variable exposure methods, providing some reference value. However, both "High" alerts output by the AI constituted false positives. The AI flagged the owner's minting authority and centralized permissions as high-risk vulnerabilities. In reality, for centralized stablecoins (like USDT), the owner having minting authority is part of the intended design. Risk assessment should consider multi-signature controls, permission governance mechanisms, and contract upgrade strategies comprehensively. The reasonableness of such permission structures fundamentally depends on the project's business model, not the code itself. The AI lacks this contextual layer and can only make judgments based on pattern matching.

This test case shows that while AI can identify permission structures, it cannot judge their reasonableness within the business context. Therefore, it directly labeled the owner's minting authority in a USDT-like contract as a "high-risk vulnerability"—a classic example of a false positive detached from actual business logic. Such false positives can interfere with a project's judgment of real risks.

Case B · Complex Business Contract (IPC Protocol / 2025-02-recall)

The second test selected the IPC Protocol project from a public report on the Code4rena platform (report link: code4rena.com/reports/2025-02-recall). This project includes multiple interdependent core components like Gateway, SubnetActor, and a Diamond proxy pattern. Its security heavily relies on a deep understanding of the overall protocol architecture and cross-component interaction logic, which is a typical scenario for high-value attacks in the DeFi ecosystem. Below are the AI audit results:

For the complex contract, the AI audit produced 3 High and 6 Medium alerts, not lacking in output volume. However, a significant portion was judged as false positives by auditors—the AI made incorrect risk judgments on code snippets lacking context. Meanwhile, out of the 9 High-level vulnerabilities confirmed by auditors, the AI only fully covered 1. It discovered 2 others but significantly underrated their severity (actually High, reported as Medium by AI). The remaining 6 were completely missed. Among the 4 Medium-level vulnerabilities, the AI covered 1, with 3 completely missing.

The common characteristic of these vulnerabilities is: they all depend on complete reasoning about cross-component state transition paths within the protocol, not pattern matching on a single function. Taking H-01 (Signature Replay) from the manual audit report as an example, the exploit path requires understanding the design intent of multi-signature verification, how an attacker constructs duplicate signature sets, and how this action bypasses the weight threshold. H-06 (leave() function reentrancy attack) is similar: the vulnerability only exists during the subnet bootstrap critical state, requiring understanding the cross-dependency between staking flow, bootstrap trigger conditions, and external call timing. Similar deep logical vulnerabilities had no record in the AI's alert list.

This result shows that in complex contract auditing: AI's auditing capability lies in pattern recognition of local code, while protocol-level vulnerabilities may stem from misunderstandings of overall business logic. When a vulnerability's trigger conditions span multiple contracts, states, and call layers, AI's current reasoning ability cannot effectively cover them.

Considering both cases, AI auditing is not without value—it makes substantial contributions in covering known vulnerability patterns, code standard checks, and providing some independent perspective discoveries. However, its value boundary is very clear: it can serve as a baseline scan, but cannot be directly used as a security conclusion. For complex protocols, relying solely on an AI report for security judgment will not only miss higher-risk vulnerabilities but also consume significant team screening time due to numerous low-quality alerts. This is precisely the core reason Beosin established a proprietary Skill knowledge base and introduced the three-audit model mechanism into the audit process.

2. Proprietary Skill Knowledge Base: An Engineering Path to Enhance AI Baseline Checks

To integrate AI auditing into the baseline check process, we must address its high rates of false positives and false negatives when auditing real DeFi protocols. Whether it's permission management, AMM liquidity mechanisms, cross-chain bridge message verification, or lending protocol liquidation logic, AI currently can only perform simple matching based on surface-level code features. It struggles to judge whether a piece of code is problematic by combining specific business scenarios and attack/defense logic. The core solution to this problem is to inject the experience accumulated by audit experts over years into the AI's judgment process in a structured way, giving it some business understanding capability.

However, it must be clarified that even with Skill enhancement, AI's role in auditing will not change. For complex issues involving multi-contract interactions, economic model analysis, and novel attack methods, manual auditing remains irreplaceable. The role of Skill is to elevate the quality of preliminary scanning to a truly useful level within the scope AI can handle (e.g., identifying common vulnerability patterns and understanding business logic to a limited extent). This provides more valuable preliminary results for manual audits, rather than generating a pile of invalid alerts requiring repeated scrutiny.

2.1 Refined from Audit Practice: The Construction Mechanism of Skill Rules

Beosin's Skill knowledge base originates from over 4000 completed manual smart contract audit projects. It is formed through extensive summarization, induction, and meticulous refinement and validation by audit experts. The formation of each rule completes the entire process from vulnerability discovery to rule implementation: after discovering a security issue in a real project, an auditor fully reconstructs the attack path, deeply analyzes the root cause, verifies the effectiveness of the fix, and finally consolidates this entire set of attack/defense knowledge into a rule entry with contextual judgment conditions, which is then incorporated into the Skill library for subsequent audit calls.

The following is a sample rule from the Skill library, containing a structured format across four dimensions: vulnerability pattern, attack path, root cause, and remediation suggestion:

[Beosin-AMM_Skill-1] Adding Liquidity Detection Bypassed via Transfer Order

Vulnerability Pattern: The contract determines if an operation is adding liquidity by checking if the WBNB balance in the Pair exceeds the reserve (balanceOf >= reserve + required). This detection relies on the assumption that WBNB arrives at the Pair before the token. However, the Router's addLiquidityETH function always transfers the ERC-20 token first, then WETH. Furthermore, the transfer order in the addLiquidity function is determined by the parameter order.

Attack Path: An attacker only needs to use addLiquidityETH (token transferred first by default) or call addLiquidity(Token, WBNB, ...) to make the Token arrive at the Pair before WBNB. During detection, WBNB hasn't arrived yet, so balanceOf == reserve, and the detection function returns false, completely bypassing the "no add liquidity" restriction.

Root Cause: The detection method based on Pair balance snapshots is inherently unreliable at the design level for distinguishing between swap and add liquidity operations. It is an architectural flaw, not an implementation bug.

Remediation Suggestion: Change to prohibiting direct transfers to the Pair from non-whitelisted addresses. All transactions must be completed through the contract's built-in functions, eliminating the fundamental flaw of balance snapshot detection at the architectural level.

This rule is not a simple annotation of a single code pattern but a systematic梳理 of a class of attacks: how the trigger conditions are formed, the path an attacker takes to bypass detection, at which point the detection mechanism has an architectural flaw, and at which level remediation needs to intervene.

2.2 Coverage of the Knowledge Base

Beosin has currently formed specialized Skill vulnerability libraries covering mainstream Web3 technology stacks, including major categories like Solidity, Rust, Motoko, FunC, Go, and ZK. Their core content, as internal core assets, is not publicly disclosed. The directory structure is as follows:

Skills under each specialized library are managed separately by vulnerability type. Each rule contains a number, trigger conditions, attack path reconstruction, contextual judgment logic, and remediation suggestions. The entire Skill library continuously iterates with the emergence of new attack events and the accumulation of audit instances, ensuring it remains synchronized with the real on-chain threat environment.

2.3 Quality Comparison of Baseline Checks After Skill Intervention

To quantify the actual impact of the Skill library on baseline scan quality, we used the two test cases from Chapter 2 as benchmarks. We ran both the general AI and the Skill-enhanced AI on the same codebases and performed a line-by-line comparison of the results.

Case A · Standard Token Contract (BEP-20) Comparison Results:

Case B · Complex Business Contract (IPC Protocol) Comparison Results:

The comparison results show that after introducing Skill, the detection quality for both types of contracts improved significantly. In the standard token contract scenario, high-risk false positives were completely eliminated due to the added business context judgment capability. In the complex business contract scenario, the coverage rate for known vulnerability patterns increased from 11% to 44%, the false positive rate dropped from about 55% to about 30%, and the accuracy of severity level judgments also improved markedly. This report can serve as a baseline check, helping project teams understand defects present in their code early. Although these issues may not directly cause fund losses immediately, they still have important positive implications for subsequent project maintenance and upgrades.

However, the data also clearly exposes the inherent boundaries of AI capability: even after Skill enhancement, the coverage rate for High-level vulnerabilities in complex contracts only reached 44%. Deep-seated vulnerabilities that require cross-contract state path reasoning, economic incentive model analysis, or specific timing conditions to trigger still far exceed the capability range of AI baseline scanning. This is precisely the fundamental reason why, even after introducing Skill enhancement, we retain the complete manual audit环节 in our audit process.

2.4 Whitepaper as Audit Input: Consistency Verification Between Code Implementation and Design Intent

In addition to the vulnerability feature library, we have added another important capability to our audit process: using the project's whitepaper as an additional input, enabling AI to verify the consistency between code implementation and whitepaper design.

Specifically, before code auditing begins, the AI systematically parses the project's whitepaper, technical specifications, and requirement documents. It extracts the role permission model, core business processes, trust boundary definitions, and expected behavior constraints to form a structured project context summary. Subsequently, throughout the entire code audit process, the AI continuously cross-references this context. This mechanism has yielded two valuable types of results in practical use:

First, for permission structures in the code that appear risky, if the whitepaper has clearly explained their design intent and constraints, the AI will adjust its judgment accordingly, effectively reducing such false positives.

Second, if there is a significant discrepancy between the code implementation and the promises in the whitepaper—for example, if a slippage protection mechanism claimed in the documentation is not implemented in the code, or if time window constraints for governance processes are not correctly enforced—the AI will issue an alert based on this. Such inconsistencies between code and documentation are easily overlooked in conventional code scans but often represent potential security risks. This also helps project teams avoid situations where the project's actual on-chain behavior diverges from their expectations.

3. The Triple-Audit Model: Collaborative Construction of a Complete Smart Contract Security Assurance System

Once a smart contract is deployed on-chain, the cost of any vulnerability is often irreversible. Beosin uses manual deep auditing + formal verification as the foundation for contract auditing, focusing on discovering and reporting issues that could already lead to fund losses or abnormal logic execution. Simultaneously, we have introduced enhanced AI baseline checks based on a proprietary Skill knowledge base to help clients discover code problems that are currently only defects and have not yet caused actual harm. Building on this, Beosin has constructed a triple-audit model: manual deep auditing + formal verification + enhanced AI baseline checks. Through layered collaboration among these three, we form a more comprehensive security assurance system.

3.1 Manual Deep Auditing and Formal Verification: The Core Pillars of Security Assurance

The core advantage of manual auditing lies in its deep understanding of overall protocol design and proactive analysis of potential risks from an attacker's perspective. Experienced audit experts are responsible for conducting comprehensive protocol-level audits of projects, including verification of cross-contract interaction logic, attack surface analysis for fund security, analysis of protocol logic under extreme market conditions, and identification and judgment of novel attack methods. This protocol-level understanding of attack and defense highly depends on long-term accumulation and practical experience within the Web3 ecosystem, which cannot be independently completed by tools at the current stage.

Building on this, Beosin transforms the judgment conclusions from manual audits into quantifiable mathematical guarantees using internal toolchains. For core business logic confirmed by audit experts—such as fund flow,

Safety

Welcome to Join Odaily Official Community