ChatGPT to do security audit? Coinbase says it's still early

Lotist

读者

2023-03-21 10:10

This article is about 1845 words, reading the full article takes about 3 minutes

Do you dare to use DeFi reviewed by AI?

AI Summary

Expand

Do you dare to use DeFi reviewed by AI?

At the moment when the artificial intelligence boom is sweeping the world, "ChatGPT" is undoubtedly the hottest topic on the Internet recently, and ChatGPT has also caused a lot of controversy in the blockchain field, from the most basic "you ask me to answer" to simple market analysis, and even design a set of quantitative strategies to speculate in coins... ChatGPT's "super power" has also penetrated into every corner of the blockchain field.

Last week, ChatGPT-4 was officially launched. According to its developer, OpenAI, "ChatGPT-4 achieves human-level performance on a variety of professional and academic benchmarks". In practical applications, ChatGPT-4 has achieved high scores in the SAT exam (American college entrance examination), and can successfully detect vulnerabilities in Ethereum smart contracts, and even propose potential methods for exploiting vulnerabilities (a bit of hacker thinking).

Conor Grogan, director of Coinbase, immediately confirmed this, saying on social media that he had inserted a live Ethereum smart contract into ChatGPT-4, and the AI found security holes in an instant, and even showed how to exploit them to attack. Conor Grogan also said that the contract was indeed exploited by hackers in 2018. In addition, he revealed that he also tried Euler's smart contract, but it could not be processed by ChatGPT-4 because the contract was too long. Conor Grogan said frankly,AI will ultimately make smart contracts safer and easier to build.

With the release of this tweet, OpenAI ChatGPT's ability to detect security vulnerabilities has become one of the hottest topics in the circle. Can ChatGPT really check the security vulnerabilities of decentralized applications? How accurate is it? Are security companies panicking? In order to answer the above questions, Coinbase quickly conducted special research.

Just this Tuesday, Coinbase officially released on its official blog the comparison experiment and report results of using ChatGPT to use the ERC 20 token review framework to perform automatic reviews and blockchain security engineers to perform reviews.

The purpose of the experiment isThe accuracy of the ChatGPT token security audit is determined by comparing the audit results with the results of standard audits performed by blockchain security engineers.In the experiment, blockchain security engineers will utilize internal tools to review each function in the token smart contract and output a risk score based on the risk tagged to the function; also, to compare the accuracy of ChatGPT with that of the standard review , also requires ChatGPT to generate a risk score.

In order for ChatGPT to use Coinbase's ERC 20 security review framework to generate risk scores, before that, Coinbase needs to issue some instructions to ChatGPT:

"I envision you as a blockchain security engineer. Your task is to identify security risks in token smart contracts based on the risks associated with their functionality. This is our framework [+ risk framework]. Are these present in the following smart contracts Risks? [+smart contract code]," so that Coinbase could define its risk framework in the ChatGPT prompt and ask it if there were any risks.

So, how does ChatGPT perform?

Coinbase compared 20 smart contract risk scores between ChatGPT and human security review in its experiments, where,ChatGPT produced the same results as human review 12 times. However, in 5 of the other 8 missteps, ChatGPT incorrectly labeled high-risk assets as low-risk.

It seems that the performance is not bad, but we all know that the consequences of underestimating the risk score are more serious than overestimating the risk score. Some high-risk currencies may be listed due to inadvertently underestimating the risk, which will seriously harm the exchanges and users. rights and interests.

According to the report obtained from the experimental results,ChatGPT can only be said to be "shallow" with the ability to quickly assess the risks of smart contracts, but it does not meet the accuracy requirements in the Coinbase security review process:

First, ChatGPT cannot identify when context is missing to perform robust security analysis. This leads to coverage gaps where additional dependencies are not reviewed. To prevent any coverage gaps, a preliminary triage of ChatGPT's review scope is required each time.

Second, the output of ChatGPT may be inconsistent; when the same question is fed to ChatGPT multiple times, it does not always output the same answer. ChatGPT also seems to be affected by comments in the code, and seems to occasionally default to comments instead of function logic.

Finally, OpenAI continued to iterate on the ChatGPT version, resulting in additional output instability. Verbose prompts that may have used to provide consistent output in the past may produce alternate output after a version change. Timely maintenance and output quality control may be required to ensure consistent response and avoid any operational failures.

To sum up, perhaps, Coinbase can improve the accuracy of ChatGPT token security audit through further engineering design. However, at present, Coinbase still cannot rely solely on ChatGPT to perform security reviews. Coinbase hopes to improve its accuracy in the future, using ChatGPT as a tool for secondary QA checks, allowing security engineers to use the tool to perform additional control checks to catch any risks that may have been overlooked. ChatGPT hints will be saved for future use by engineers and improvements are planned in the future.

Just like Coinbase's experiment, we can further improve the accuracy of ChatGPT by adjusting the instruction design, but for more mobile work such as security audits, ChatGPT alone cannot guarantee that it can be made when the variables cannot be unified. For accurate judgment, manual intervention is still required to improve ChatGPT's prompts according to specific additional control checks.

In general, the introduction of AI in the blockchain industry undoubtedly provides the possibility for start-ups in the circle to build efficiently through artificial collaboration AI-ChatGPT has an extensive knowledge base, manual input of specific business logic and prompts, and developers More can be done in less time by using ChatGPT. In addition, ChatGPT provides a timely and cost-effective contract audit assistance for the high cost of smart contract audits by security engineers.

As blockchain developer Salman Arshad answered at the ETHDubai conference, "ChatGPT and AI tools are boon; they are not our enemy, nor are they designed to end a developer's career."The collaborative nature of ChatGPT may be more beneficial to users than the potential threat of automating the process and replacing humans.

Finally, I would like to ask you as a user, do you dare to use the DeFi protocol that has been audited by ChatGPT?