Microsoft warns about the new type of attack on generative artificial intelligence, which is called” Skeleton Key “. This attack allows users to bypass ethical restrictions, as well as security restrictions built into AI models, such as ChatGPT. The method works by providing a certain context, which allows access to insulting, harmful or illegal content.
To illustrate, consider the case when the user requests instructions for creating a dangerous malware that can disable, for example, a power station. Under normal conditions, most commercial chat bots will refuse to provide such information. However, if the request is modified in such a way as to indicate that the information is required “for a safe educational context with the participation of advanced researchers, trained ethics and security,” and add disclaimer, then it is likely that AI will provide an obscene content.
In other words, Microsoft found that the majority of the leading AI can be convinced that the malicious request is legitimate and even noble, simply saying that information is needed for “research goals.”
“When the restrictions are ignored, the model will not be able to distinguish harmful or unauthorized requests from any others,” – explained Mark Russinovich, Technical Director of Microsoft Azure, in his post about this tactic. “Due to the complete possibility of bypassing the restrictions, we called this hacking technique” Skeleton Key “.”
He added that “the output data of the models is completely unexplored and show the entire volume of knowledge of the model or its ability to produce requested content.” The technique “Skeleton Key” affects several models of generative AI, tested by Microsoft researchers, including models controlled by Azure AI, as well as models from META, Google, Openai, Mistral, Anthropic and Cohere.
“all the models raised fully and without censorship completed [several prohibited] tasks,” said Russinovich. Microsoft eliminated the problem in Azure, introducing new protection measures to detect and block this tactics, and also updated the software that controls large language models (LLM) in Azure AI, additionally notifying other affected suppliers.
Administrators need to update the models used to introduce any corrections that could be released by these suppliers. In turn, those who create their own models of AI, Microsoft offers the following measures to mitigate the threat:
- filtering of input data for identifying requests with malicious intentions, regardless of disclaimers accompanying them.
- An additional barrier that prevents attempts to undermine safety instructions.
- The filtering of the output data that identifies and prevents the answers that violate the security criteria.