The other day, the team of researchers Hiddenlayer Sai introduced “Shadowlogic” technique, which allows you to introduce hidden bookmarks in the model of machine learning. This method, which does not require the addition of code, is based on manipulations with computing graphs of models. It allows attackers to create such attacks on artificial intelligence, which are activated only upon receipt of a special trigger message, which makes them a serious and difficult to grow threat.
The tabs in the software, as a rule, provide attackers with access to the system, allowing you to steal data or carry out sabotage. However, in this case, the laying is introduced at the level of the model’s logic, which makes it possible to control the result of its work. These attacks are preserved even with the training of the model, which enhances their danger.
The essence of the new technique is that instead of modifying the weights and parameters of the model, attackers are manipulated by the computing graph – the model of the model, which determines the sequence of operations and data processing. This allows you to secretly introduce malicious behavior in a model of any type, from image classifiers to text processing systems.
An example of using the method is a modification of the Resnet model, widely used for image recognition. Researchers introduced a bookmark into it, which is activated when solid red pixels are found in the image.
Researchers assure that if desired, the trigger can be well disguised. So that he will cease to be visible to the human eye. As part of the study, when activating the trigger, the model changed the initial classification of the object. This demonstrates how easily such attacks can go unnoticed.
In addition to Resnet, the Shadowlogic method was successfully applied to other AI models, for example, Yolo used to detect objects on video, as well as language models such as Phi-3. The technique allows you to change their behavior depending on certain triggers, which makes it universal for a wide range of artificial intelligence systems.
One of the most alarming aspects of such bookmarks is their stability and independence from specific architectures. This opens the way for attacks on any systems using models with graphic structure from medicine to finance.
Researchers warn that the emergence of such vulnerabilities reduces confidence in AI. In conditions when models are increasingly integrated into critical infrastructure, the risk of hidden bookmarks can undermine their reliability and slow down the development of technology.