Mistral AI and NVIDIA presented Mistral NeMo

24.07.2024,

On July 18, 2024, Mistral AI and NVIDIA announced the release of Mistral NeMo , an advanced language model developed as a result of their joint efforts. This 12 billion-parameter model represents a significant breakthrough in artificial intelligence technology, combining Mistral AI's training data expertise with NVIDIA's optimized hardware and software ecosystem.

mistral-s-new-nemo-model-2

The model was trained on the NVIDIA DGX cloud AI services platform using 3,072 NVIDIA H100 80GB Tensor Core GPUs, demonstrating the advanced infrastructure behind its development.

Mistral NeMo is designed for high performance in a variety of natural language processing tasks. It outperforms models of its size such as the Gemma 2 (9B) and Llama 3 (8B) in accuracy and efficiency. Its 128K token context window allows you to process rich and complex information more coherently. The introduction of the new Tekken tokenizer, based on Tiktoken, provides approximately 30% more efficient compression of source code and several major languages, with the effect even more noticeable for Korean and Arabic.

The Mistral NeMo scale model is available on HuggingFace for basic and training versions. It can be used with the mistral-inference and mistral-finetune tools. For enterprise deployment, Mistral NeMo is packaged as an NVIDIA NIM inference microservice available through ai.nvidia.com. Designed to run on a single NVIDIA L40S, GeForce RTX 4090, or RTX 4500 GPU, it brings powerful AI capabilities directly to business desktops, making it easily accessible to a variety of organizations.

mistral-s-new-nemo-model-3

The Mistral NeMo 12B delivers impressive performance compared to other models of its size. According to benchmarks, it outperforms both Gemma 2 (9B) and Llama 3 (8B) in terms of accuracy and efficiency. The model is priced competitively at $0.3 per 1 million input and output tokens, which puts it in an advantageous position compared to larger models such as GPT-4 (32k context) and Mixtral 8x22B, which are significantly more expensive. Mistral NeMo's 128K context window and advanced tokenization with Tekken give it an advantage when handling long-form content and multilingual tasks, outperforming Llama 3's tokenizer in text compression for approximately 85% of all languages.

The model can be applied to a wide range of tasks, including enterprise AI solutions, chatbots and conversational AI systems. Its multilingual capabilities are especially useful for global businesses and organizations with diverse language requirements. In addition, the model's high coding accuracy positions it as a valuable tool for software development and code generation. The combination of a large context window and advanced reasoning capabilities also makes Mistral NeMo well suited for complex text mining, summarization and research applications across a variety of industries.