Openai Introduced New Method Of Protecting AI From Incorrect Teams

Researchers from Openai developed new technology entitled “Hierarchy of Instructions”, which enhances the protection of II models from abuse and unauthorized commands. This method allows models to pay more attention to the initial instructions of the developer, ignoring incorrect user requests.

The first model using the new method is a recently launched light version of the GPT -4O Mini. The technique of the hierarchy of instructions helps models to follow the developer’s system messages, which significantly increases their safety and reduces the risk of using “intruding” commands.

The Openai research article explains that the existing large language models (LLM) are not able to distinguish between user commands and system instructions of the developers. The new method allows the system to give priority to system instructions and ignore malicious requests, for example, such as “forget all previous instructions”.

The new protection is especially important for future fully automated agents who can perform various tasks in the digital life of users. Such agents should be resistant to attacks in order to prevent leakage of confidential information.

Recently, Openai has criticized security and transparency. Internal letters of employees and the departure of key researchers emphasize the need to improve these aspects. The introduction of methods such as the hierarchy of instructions is an important step towards increasing users’ trust in AI and ensuring their safety.

with the improvement of the protection of AI models will be able to fulfill their functions more reliable, which makes their use safer and more effective in various fields.

/Reports, release notes, official announcements.