A research article recently published by two Salus Security experts sheds light on the impressive abilities of GPT-4, an artificial intelligence (AI) developed by the blockchain security company. The researchers delved into GPT-4’s capabilities in parsing and auditing smart contracts, revealing both its strengths and limitations.
To conduct their study, the Salus researchers utilized a dataset comprising 35 smart contracts, collectively known as the SolidiFI-benchmark vulnerability library. This comprehensive collection contained a staggering 732 vulnerabilities, serving as the basis for evaluating GPT-4’s proficiency in identifying potential security weaknesses across seven common types of vulnerabilities.
The findings were intriguing. ChatGPT, the AI used in the research, demonstrated remarkable accuracy in detecting true positives. These true positives represent genuine vulnerabilities that would warrant further investigation outside of a controlled testing environment. In fact, ChatGPT achieved a precision rate exceeding 80% during the testing phase.
However, a notable challenge arose when it came to generating false negatives. This issue was quantified through a metric called the “recall rate.” The Salus team’s experiments revealed that GPT-4’s recall rate was alarmingly low, reaching a mere 11%. It is important to note that a higher recall rate is indicative of better performance in this context.
Based on these observations, the researchers concluded that GPT-4’s vulnerability detection capabilities are lacking, with its highest accuracy level reaching only 33%. Consequently, the experts recommend relying on dedicated auditing tools and the tried-and-true expertise of human auditors to thoroughly assess smart contracts. This recommendation persists until AI systems like GPT-4 can undergo significant advancements to bridge this gap.