AI Systems Attacked with ASCII Shots

A group of researchers from Washington, Illinois and Chicago universities revealed a new method for bypassing restrictions on the processing of hazardous content in the AI ​​CHATOBOTS, built on the basis of large language models (LLM). The attack is based on the fact that the language models gpt-3.5 , gpt-4 (Openai), gemini (Google), clause (Anthropic) and Llama2 (meta) successfully recognize and take into account the text designed in the form ASCII graphics . Thus, to bypass the filters of dangerous questions, it was enough to indicate the prohibited words in the form of an ASCII picture.






, in its effectiveness, the new attack method has noticeably surpassed other well -known ways of bypassing filters in chatbots. The highest quality of the ASCII graphics recognition is recorded in Gemini, GPT-4 and GPT-3.5 models, the level of successful bypassing of filters by verification queries (HPR, Helpful Rate) in which, when testing, it is estimated at 100%, 98%and 92%, an indicator of success Attacks (ASR, Attack Success Rate) in 76%, 32%and 76%, and the danger level of answers received (HS, Harmfulness Score) on a five -point scale in 4.42, 3.38 and 4.56 points, respectively.






Researchers also demonstrated that currently common methods for protecting against bypassing the PPL, Paraphase and Retokenization filters are not effective for blocking the ARTPROMPT attack. Moreover, the use of the Retokenization method even increased the success coefficient of attack.



/Reports, release notes, official announcements.