Mistral’s Large 2 Surpasses Llama 3.1, Rivals GPT-4

mistral introduced the latest flagship model Large 2, which is designed to compete with Openai and Meta* in the generation of code, mathematics and logic. The release of Large 2 occurred just a day after the release of the new Meta Llama 3.1 405B model with open source.

using only 123 billion parameters, Large 2 surpasses Llama 3.1 405B in performance in code generation and mathematics, and works at the same level with leading models GPT-4O, Claude 3 OPUS. In particular, according to the standard mmlu the preluded version reaches accuracy of 84.0%. According to Mistral, Large 2 formulates more compressed answers compared to the leading models of AI, which are often overly multiple.

Comparison of the performance of Large 2 and Llama 3.1 in the generation of code and mathematics

One of the key areas in teaching the model was to minimize problems with “hallucinations”, that is, erroneous answers. The model was trained to react more carefully to requests, admitting when she does not know something, instead of inventing believable, but wrong answers.

It is important to note that Mistral models, like most others, are not open in the traditional sense – a paid license is required for commercial use of the model. Although the model is more open than, for example, GPT-4, only a few in the world have sufficient experience and infrastructure for the implementation of such large-scale models.

What is missing in the Mistral Large 2 (as in Llama 3.1) – multimodal opportunities. In the field of multimodal systems that can process images and text at the same time, Openai is significantly ahead of competitors, and some startups are actively striving to introduce such functions.

The accuracy of performance in the code generation (all models were tested using the same assessment conveyor)

Large 2 can process up to 128,000 tokens per request, which is equivalent to about 300 pages of the book. The new model also improved the support of several languages. Large 2 understands dozens of languages, including English, French, German, Spanish, Russian, Chinese and others, as well as 80 programming languages, including Python, Java, C, C ++, Javascript and Bash.

The accuracy of performance on multipl-E (all models were tested using the same estimated conveyor, with the exception of the line “on paper”)

You can use Large 2 on the platforms of Google Vertex Ai, Amazon Bedrock, Azure Ai Studio and IBM WatsonX.AI. The model is also available on the Mistral platform called “Mistral-Large-2407” and is available for free testing on the Mistral le CAT . Weights for the model are available and also are designed on HugingFace.

* META and its products are recognized as extremist, their activities are prohibited in the territory of the Russian Federation.

/Reports, release notes, official announcements.