In the Great Data era, Vision Transformer (VIT) models trained on extensive data sets have become a standard for improving performance in various AI applications. Visual prompts (VP) allow for the adaptation of models for specific tasks without complete retraining. However, the safety risks associated with VP remain largely unexplored.
Analysts from the Tencent Security Department and scientists from the University of Tsinghua, the University of Zhejiang, the Research Center for Artificial Intelligence, and the Penn Cheng laboratory have identified a new threat for VP in cloud services. They discovered that attackers could manipulate a special “switch” token to quickly shift the model between normal and infected modes.
The researchers have named this method of attack Switchable Attack Against Pre-Trained Models (SWARM). SWARM optimizes the model’s parameters and token assignments to operate normally without the switch, but exhibit malicious behavior when activated.
Experiments have demonstrated the high efficiency and stealth of SWARM. Attackers can manipulate the model’s behavior in cloud services without accessing user data. The model functions correctly in normal mode but executes attacks successfully when triggered in infected mode.
Despite the availability of mitigation techniques such as Neural Attention Distillation (NAD) and I-BAU, SWARM has proven to be highly successful, achieving up to 96% and 97% success rates respectively in bypassing these defenses in most cases.
Chinese engineers highlight the threat posed by SWARM’s ability to evade detection and mitigation measures, underscoring the urgency for enhanced protection mechanisms. SWARM introduces novel attack strategies and urges further exploration in the realm of cybersecurity.
As a result, this new research emphasizes the need to address the security implications of utilizing visual prompts in pre-trained VIT models, prompting the development of robust defenses against potential threats.