«

再见,GPT-5。你好,Qwen。

qimuai 发布于 阅读:24 一手编译


再见,GPT-5。你好,Qwen。

内容来源:https://www.wired.com/story/expired-tired-wired-gpt-5/

内容总结:

今年夏天一个细雨微风的午后,记者探访了位于杭州的智能眼镜创业公司Rokid。工程师的普通话通过其最新原型设备,实时翻译成英文并呈现在记者右眼前方的微型透明屏幕上。这背后驱动的是阿里巴巴开发的开源大模型“通义千问”。

虽然通义千问在部分基准测试中尚未超越GPT-5、Gemini 3等国际顶尖模型,也非首个开源前沿模型,但其正以“好用易改”的特性赢得全球开发者青睐。据AI社区HuggingFace统计,今年7月中国开源模型下载量已超越美国同类产品。平台OpenRouter数据显示,通义千问已成为全球第二大受欢迎的开源模型。

对Rokid用户而言,该模型能实现商品识别、路线规划、信息起草等多样化功能。由于支持本地化部署与定制调优,企业可将其适配于智能眼镜、汽车中控等终端设备。记者在访华前就曾在笔记本电脑本地部署轻量化版本用于中文学习。

过去一年间,通义千问等中国开源模型的崛起恰逢部分美国模型遭遇瓶颈:Meta的Llama 4发布时性能未达预期,OpenAI的GPT-5被指存在交互生硬与基础错误问题。与之形成对比的是,中国团队持续公开模型研发细节,其发表于NeurIPS顶级会议的训练优化论文荣获年度最佳,吸引了全球数百项学术研究采用。

这种开放生态正催生跨行业应用创新:比亚迪已将模型集成至车载助手,Airbnb、英伟达等国际企业也纷纷引入。业内观察指出,当美国企业过度聚焦基准测试竞争时,中国开源模型正以“推动实际应用”为标准开辟新赛道——衡量AI价值的核心指标不仅是智能水平,更在于其能否成为广泛创新的基石。

中文翻译:

今年夏天一个细雨纷飞、疾风劲吹的午后,我探访了位于杭州的智能眼镜初创企业Rokid总部。与工程师们交谈时,他们的话语通过该公司最新原型设备,从普通话即时翻译成英语,并转录到悬于我右眼上方的微型透明屏幕上。

Rokid的高科技眼镜搭载的是通义千问——由中国电商巨头阿里巴巴开发的开源大语言模型。通义千问并非当前最顶尖的AI模型。在衡量机器智能多维度的基准测试中,OpenAI的GPT-5、谷歌的Gemini 3和Anthropic的Claude通常得分更高。它也不是首个真正前沿的开源模型,这一殊荣属于Meta于2023年发布的Llama。

然而通义千问与来自深度求索、月之暗面、智谱AI、MiniMax等中国企业的模型正日益受到青睐,因为它们既性能出色又易于调校。据提供AI模型与代码服务的HuggingFace统计,今年7月其平台上中国开源模型的下载量已超越美国模型。深度求索曾以远低于美国同行的算力成本发布前沿大语言模型震动业界,而根据AI模型调度平台OpenRouter的数据,通义千问今年人气急速攀升,已成为全球第二大热门开源模型。

通义千问能实现人们对先进AI模型的大部分期待。对Rokid用户而言,这包括识别内置摄像头拍摄的产品、获取地图导航、起草信息、网络搜索等功能。由于该模型易于下载修改,Rokid部署了经针对性优化的版本。用户甚至能在智能手机等设备上运行精简版通义千问,以应对断网场景。

赴华前我曾在MacBook Air安装精简版通义千问练习基础汉语。就许多用途而言,这类轻量级开源模型与数据中心里的庞然大物同样实用。

过去12个月间,通义千问等中国开源模型的崛起恰逢部分美国知名AI模型遭遇波折。当Meta于2025年4月发布Llama 4时,其模型表现未达LM Arena等主流基准测试高度,令许多开发者转向其他开源模型。

OpenAI今年8月发布的最新模型GPT-5同样反响平平,部分用户抱怨其交互态度异常冷淡,还有人发现其犯下令人诧异的简单错误。虽然OpenAI同期发布了性能稍逊的开源模型gpt-oss,但通义千问等中国模型因持续投入构建更新、技术细节公开透明而保持更高人气。

在顶级AI会议NeurIPS上,数百篇学术论文采用了通义千问。倡导美国开源模型的非营利组织Laude Institute联合创始人安迪·科温斯基指出:"许多科学家选择通义千问,因为它是当前最优秀的开源模型。"

科温斯基强调,中国AI企业坚持开放精神,定期发布详述工程训练新方法的论文,这与美国大公司日益封闭的作风形成鲜明对比——后者似乎总在担心知识产权泄露。通义千问团队关于提升模型训练智能的论文,今年被NeurIPS评为最佳论文之一。

其他中国领军企业正运用通义千问进行原型开发。造访Rokid前几日,我目睹了中国头部电动车制造商比亚迪如何将该模型集成于全新车载助手。美国企业也在积极采用:Airbnb、Perplexity和英伟达均已部署通义千问。就连开源模型先驱Meta,据传也正借助通义千问开发新模型。

科温斯基批评美国AI公司过度聚焦数学或编程等狭窄基准测试的边际优势,却忽视了确保模型产生广泛影响力。"当基准测试无法反映真实使用场景或现实问题时,企业就会陷入这种疲态尽显的错位模式。"

通义千问及同类模型的崛起似乎昭示着,衡量AI模型的关键指标除智能水平外,更应关注其赋能其他领域应用的广度。以此为基准,通义千问等中国开源模型正在开创属于它们的时代。

英文来源:

On a drizzly and windswept afternoon this summer, I visited the headquarters of Rokid, a startup developing smart glasses in Hangzhou, China. As I chatted with engineers, their words were swiftly translated from Mandarin to English, and then transcribed onto a tiny translucent screen just above my right eye using one of the company’s new prototype devices.
Rokid’s high-tech spectacles use Qwen, an open-weight large language model developed by the Chinese ecommerce giant Alibaba.
Qwen—full name 通义千问 or Tōngyì Qiānwèn in Chinese—is not the best AI model around. OpenAI’s GPT-5, Google’s Gemini 3, and Anthropic’s Claude often score higher on benchmarks designed to gauge different dimensions of machine cleverness. Nor is Qwen the first truly cutting-edge open-weight model, that being Meta’s Llama, which was released by the social media giant in 2023.
Yet Qwen, and other Chinese models—from DeepSeek, Moonshot AI, Z.ai, and MiniMax—are increasingly popular because they are both very good and very easy to tinker with. According to HuggingFace, a company that provides access to AI models and code, downloads of open Chinese models on its platform surpassed downloads for US ones in July of this year. DeepSeek shook the world by releasing a cutting-edge large language model with much less compute than US rivals, but OpenRouter, a platform that routes queries to different AI models, says Qwen has rapidly risen in popularity through the year to become the second-most-popular open model in the world.
Qwen can do most things you’d want from an advanced AI model. For Rokid’s users, this might include identifying products snapped by a built-in camera, getting directions from a map, drafting messages, searching the web, and so on. Since Qwen can easily be downloaded and modified, Rokid hosts a version of the model, fine-tuned to suit its purposes. It is also possible to run a teensy version of Qwen on smartphones or other devices just in case the internet connection goes down.
Before going to China I installed a small version of Qwen on my MacBook Air and used it to practice some basic Mandarin. For many purposes, modestly sized open source models like Qwen are just as good as the behemoths that live inside big data centers.
The rise of Qwen and other Chinese open-weight models has coincided with stumbles for some famous American AI models in the last 12 months. When Meta unveiled Llama 4 in April 2025, the model’s performance was a disappointment, failing to reach the heights of popular benchmarks like LM Arena. The slip left many developers looking for other open models to play with.
When OpenAI unveiled its latest model, GPT-5, in August it also underwhelmed. Some users complained of an oddly cold demeanor while others spotted surprising simple errors. OpenAI released a less powerful open model called gpt-oss the same month, but Qwen and other Chinese models remain more popular because more work is put into building and updating them, and because details of their engineering are often published widely.
Hundreds of academic papers presented at NeurIPS, the premier AI conference, used Qwen. “A lot of scientists are using Qwen because it's the best open-weight model,” says Andy Konwinski, cofounder of the Laude Institute, a nonprofit established to advocate for open US models.
The openness adopted by Chinese AI companies, which sees them routinely publishing papers detailing new engineering and training tricks, stands in stark contrast to the increasingly closed ethos of big US companies, which seem afraid of giving away their intellectual property, Kowinski says. A paper from the Qwen team, detailing a way to enhance the intelligence of models during training, was named as one of the best papers at NeurIPS this year.
Other big Chinese companies are using Qwen to prototype and build. A few days before visiting Rokid, I saw how BYD, China’s leading EV maker, has integrated the model into a new dashboard assistant. US firms are adopting Qwen too. Airbnb, Perplexity, and Nvidia are all using Qwen. Even Meta, once the pioneer of open models, is now said to be using Qwen to help build a new model.
Kowinski says US AI companies have become too focused on gaining a marginal edge on narrow benchmarks measuring things like mathematical or coding skills at the expense of ensuring that their models have a big impact. “When benchmarks are not representative of real usage or problems being solved in the world, you end up in this tired, misaligned mode,” he says.
The rising prominence of Qwen and similar models does seem to suggest that a key measure for any AI model, beyond how clever it is, should be how widely it is used to build other stuff. By that benchmark, Qwen and other open Chinese models are ascendant.

连线杂志AI最前沿

文章目录


    扫描二维码,在手机上阅读