Google has reported that the II-model of the corporation first identified a memory safety vulnerability in real conditions. The vulnerability involved a stack overflow in SQLite, which was fixed before the release of the vulnerable code.
The LLM tools for detecting Big Sleep errors were developed in collaboration with DeepMind. This development is said to be an evolution of the previous Naptime project introduced in June.
SQLite, a widely-used open-source database engine, faced a potential issue that could enable attackers to trigger system crashes or execute arbitrary code. The vulnerability was linked to an error regarding the use of -1 as an array index. While such values were present in the development version of the program, they were absent in the final build.
During the latest test, the team compiled the most recent commits from the SQLite repository and manually removed minor changes to streamline the AI analysis process. Subsequently, the Gemini 1.5 Pro model identified an issue related to changes in the commit [1976c3f7].
The vulnerability could potentially be exploited by a crafted database provided by an attacker or via SQL injection. However, Google acknowledges that this error is quite intricate to leverage. Nevertheless, the company views the success of its AI as a breakthrough in vulnerability discovery.
Traditional methods like phasing were unable to detect this particular problem. Hence, the AI model managed to uncover a previously unknown vulnerability in widely-used software for the first time. Big Sleep identified the deficiency in early October by analyzing changes in the source code, prompting SQLite developers to patch the vulnerability on the same day to prevent it from reaching an official release.
Google highlights that although phasing has made significant strides, there is still a need for methods that can uncover vulnerabilities evasive to this approach. The company looks to AI as a potential solution to bridge this gap. Big Sleep is still being researched and currently utilized to analyze smaller programs with known vulnerabilities. Google stresses that the findings are experimental at this stage.