RAID: Majority of Black Box Detectors Fail

Since the release of GPT-2 in 2019, technologies using large language models (LLM) have advanced significantly forward. Now cars are able to create texts so similar to people written by man that even experienced readers often cannot recognize that the text was created by artificial intelligence. This situation raises serious questions about what risks can arise when using such technologies.

LLM technologies are used to accelerate the process of creating texts, as well as to increase creativity, but their power does not always benefit. Often it turns into abuse and harms, which is already noticeable in various areas where information is consumed. The inability to accurately determine who the text was created – by a person or machine – strengthens this risk.

Today, both the academic community and commercial companies make efforts to improve the methods of recognizing texts created by AI. The irony is that the same machines are used for this. Machine learning models can identify subtle patterns in choosing words and grammatical constructions that a person can miss.

Many commercial detectors claim that machine-exteenated texts can be detected with an accuracy of 99%. But is it really so? Professor of computer and information sciences Chris Kallison-Berch and a graduate student of his research group Liam Dugan decided to understand this issue. Their work was presented at the 62nd annual meeting of the Association of Computation Linguistics and published on the server of preprincins arxiv.

Kallison-Berch notes that as technologies for the detection of machine-extensified texts, methods for evading such detectors are also being improved. This is a real arms race, and although the desire to create reliable detectors is important, there are many restrictions and vulnerabilities in the solutions accessible to date, added Professor.

To study these restrictions and search for ways to create more reliable detectors, the research group developed Robust AI Detector (RAID) – a set of data that includes more than 10 million documents: recipes, news articles, blog entries and much more, as created and created and so And written by people. RAID has become the first standardized standard to check the ability of detectors to detect machine-exteenated texts.

/Reports, release notes, official announcements.