Deepseek wants to confirm the test with its new AI model

Update V3-0324 marks remarkable progression in several key areas. According to benchmarks published by the company, the model displays an improvement of nearly 20 points in the American mathematics examination, from 39.6 to 59.4, and an increase of 10 points in the LiveCodebench coding test.

This progression is explained in particular by the MOE architecture (Mixture of Experts), which constitutes a different approach to traditional AI models. Instead of activating all of its parameters for each task, Deepseek-V3-0324 only uses about 37 billion of its 685 billion parameters to process a specific request, requesting its different parameter groups, named “experts” dynamically.

The model also incorporates two major innovations: the Multi-Head latent attention (MLA) which improves context management in long texts, and Multi-token prediction (MTP) to generate several tokens simultaneously. This technological combination increases the output speed by almost 80%, a considerable advantage compared to competing models, especially since it is always an open-source model.

Source link