Open Source Initiative (OSI), which is being checked by licenses for compliance with the criteria Open Source, approved document Open Source Ai Definition v1.0 (Osaid), in which the definition of open ai is formulated. AI system can be considered open if it meets the following criteria:
- the ability to use for any purpose without the need to obtain a separate permit;
- the ability to study the operation of the system and inspect its components for understanding how the results are created;
- the possibility of making changes for any purpose, including a change in the information displayed system;
- The possibility of transferring to other persons both the initial option and the editorial office after making changes, without limiting the goals of use.
To provide the possibility of making changes, the open AI system should include:
- detailed information about the data used in training, and teaching methodology. The information should be enough for a professional developer to be able to recreate the equivalent AI system on its own, using the same or similar data for learning.
- The source code that allows both the AI system to launch and perform the process of its training. The code should also cover areas such as preprocessing, data verification and tokenization. In addition, a detailed description of the architecture of the model should be provided.
- Model parameters (weight coefficients), implying the presence of a condition ready to use after training or the presence of a final optimized model version.
Large language models of machine learning, recognized by the corresponding prepared criteria: pythia (eleuther ai), olmo (ai2), amber (llm360), crystalcoder (llm360) and bloom (bigscience), < a href=”https://github.com/bigcode-project/starcoder2″> starcoder2 (bigcode) and falcon (TII).
Large language models of machine learning, recognized as not relevant to the criteria of open AI systems, due to the lack of the necessary components or due to the availability of requirements incompatible with the principles of Open Source: Llama2 (meta), grook (x/twitter), phi-2 (microsoft) and Mixtral (mistral).