White House Orders AI Strength Test by Hackers

At the Def Con 31 hackers convention for the first time is held a competition to identify errors and disadvantages in large linguistic models of artificial intelligence (AI). The event involves large technological companies such as Meta*, Google, Openai, Anthropic, Cohere, Microsoft, Nvidia and Stability.

The purpose of the competition, which is supported by the White House, is to “identify problems in AI systems” and “create an independent assessment.”

The competition will be as follows: within two and a half days, 3,000 participants will work for one of 158 laptops and get 50 minutes to try to find flaws in eight language models of AI. Participants will not know what kind of company they work with the model, although experienced can guess. Glasses are awarded for the successful completion of the task, and the one who has the largest total number of points wins. The prize is a powerful computer kit with a graphic processor, however, in fact, the “right to brag” with their successes.

will be more important for the participants.

Among tasks, for example:

force the model to hallucinate or come up with the fact of a political person or a famous person;
Check the sequence of the model and its work in different languages;
identify prejudices or discrimination in the answers of the model.

The White House believes that the event “will provide critical information about the influence of artificial intelligence models and will allow companies and developers to take steps to correct problems.” In connection with the rapid development of technology, there were concerns about the spread of misinformation and manipulation of information, especially on the eve of the US presidential election next year.

After the end of the competition, the companies will be able to see the collected data and respond to any identified shortcomings. It is expected that the results of the competition will be published next year, which will allow scientists and developers to continue work on improving and regulating artificial intelligence.

/Reports, release notes, official announcements.