Alibaba Group Holding has elevated its AI game with the introduction of Qwen2-Math, a new suite of large language models (LLMs) specifically designed to excel in mathematical problem-solving. According to the e-commerce giant, these models outperform OpenAI’s GPT-4o and other leading LLMs in mathematics.
Announced during a developer update on GitHub, the Qwen2-Math models build upon the earlier Qwen2 series released in June. These new models are tailored to enhance arithmetic and complex problem-solving capabilities. The Qwen2-Math-72B-Instruct, the largest and most advanced of the three, has demonstrated superior performance on various maths benchmarks compared to models from OpenAI, Anthropic, Google, and Meta.
The Qwen2-Math series was evaluated using a range of benchmarks, including GSM8K, OlympiadBench, and China’s gaokao examination. The models were tested in both English and Chinese, although the current version supports only English. Alibaba plans to release bilingual and multilingual models soon.
This advancement follows Alibaba’s recent success with its Qwen-72B-Instruct model, which secured a high position in global open-source model rankings. The rapid progress in Alibaba’s AI capabilities highlights the growing competitiveness of Chinese AI models, narrowing the gap with their US counterparts.
In July, the Qwen2-72B was ranked 20th in an evaluation by LMSYS, a UC Berkeley-supported AI research organization, with OpenAI, Anthropic, and Google models dominating the top-10 slots.