稳定AI发布企业级音频生成模型

qimuai 发布于 2025-9-13 11:01 阅读：164 一手编译

稳定AI发布企业级音频生成模型

内容来源：https://aibusiness.com/generative-ai/stability-ai-releases-audio-generation-model-for-enterprises

内容总结：

由谷歌云赞助的生成式AI应用案例研究显示，企业首次引入生成式AI时应优先关注提升信息交互体验的领域。Stability AI公司近日推出企业级音频生成模型Stable Audio 2.5，该模型具备不足两秒的极速推理能力，可生成三分钟长音频，支持文本转音频、音频转音频及音频修复功能，允许用户对自有音频文件进行AI化处理。

行业分析师指出，当前企业级音频模型领域尚属蓝海。高德纳分析师阿伦·钱德拉塞卡表示，该技术在设计、营销和通信行业具有广阔应用前景。富特伦集团分析师布拉德利·希明则指出，音频修复功能可在客服中心、语音助手等场景发挥价值。

值得注意的是，该领域面临版权诉讼风险。业界建议模型开发商需建立完善的数据授权机制，并为商业用户提供版权侵权保护。Stability AI宣称该模型采用全授权数据集训练，符合商业安全标准。

目前该模型已通过Stability AI接口及Replicate等第三方平台开放商用。

中文翻译：

由谷歌云赞助
如何选择首个生成式AI应用场景
开展生成式AI应用时，应首先关注能优化人类信息交互体验的领域。该模型凭借快速推理能力和音频修复等功能，旨在强化各行业的音频生产效能。

Stability AI近日推出了企业级音频生成模型。这家开发Stable Diffusion的公司于周三发布Stable Audio 2.5，该音频模型专为企业级音效生产设计，可帮助客户大规模创建可定制的高品质音频。

该模型在GPU上的推理速度低于两秒，能在数秒内生成三分钟长的音轨。据Stability AI介绍，它还能响应包含"振奋人心"等情绪描述的指令。除支持音频到音频、文本到音频功能外，该模型具备音频修复能力，用户可对自有音频文件应用AI工具。

此次发布距2024年4月Stable Audio 2.0的推出约17个月。与专注于语音、文本或图像的其他主流生成式AI模型相比，Stable Audio系列为企业提供了差异化功能。

高德纳分析师阿伦·钱德拉塞卡兰指出："企业级音乐/音频模型目前较为罕见，这属于非常新兴的细分应用领域。市场上鲜有其他供应商涉足，因此具有独特性。"

富特伦集团分析师布拉德利·希明补充道，由于行业特性和侵权风险，多数模型开发商对音乐生成领域持保守态度，但Stable Audio 2.5打破了这种局面。"任何涉及人类交互的商业场景——无论是软件开发、应用构建还是销售支持——声音元素都至关重要。"

钱德拉塞卡兰认为该模型将在设计、营销和通信行业大有可为，这些领域的团队能通过音频模型获得显著效益。希明指出，音频修复功能也可应用于呼叫中心或帮助台场景，例如集成语音助手的自助服务终端或销售设备。"他们既开拓了音乐创作等未被充分开发的领域，又夯实了企业级基础功能。"

钱德拉塞卡兰推测Stability AI可能瞄准提供多模态AI或音频生产功能的SaaS应用。"其开发的模型将成为这些应用的底层支撑"，这种模式类似于微软Copilot与OpenAI的合作方式。"从开发者或软件工程负责人的视角来看，底层是否采用OpenAI模型并非关键。部分企业可能通过与应用提供商合作的方式间接服务企业市场。"

若未建立严格防护机制，音频模型开发商将面临挑战。当前OpenAI和Anthropic等AI厂商正因使用受版权保护数据训练模型而面临诉讼——Anthropic上周试图和解图书出版商的诉讼，但被联邦法官叫停。

希明表示Stability AI通过与声音品牌机构Amp合作，显然已意识到数据使用的法律风险。"可预见的是，Stability这类企业将与音频制作方建立合作补偿机制。"他强调客户也需获得法律免责保障和训练数据透明度。

Stability AI声明该模型具备商业安全性，所用训练数据均获完整授权。Stable Audio 2.5可通过Stability AI API及Replicate、ComfyUI、Fal等平台获取。

英文来源：

Sponsored by Google Cloud
Choosing Your First Generative AI Use Cases
To get started with generative AI, first focus on areas that can improve human experiences with information.
With rapid inference speeds and features such as audio inpainting, the model aims to bolster audio production in various industries.
Stability AI has introduced what it calls an enterprise-grade audio generation model.
On Wednesday, the Stable Diffusion maker launched Stable Audio 2.5, an audio model designed for enterprise-grade sound production to help customers create customizable, high-quality audio at scale.
The model has an inference speed of less than two seconds on a GPU, allowing it to generate three-minute-long tracks within seconds. It can also respond to prompts that feature descriptive moods such as "uplifting," according to Stability AI. Besides supporting audio-to-audio and text-to-audio functionality, the model allows for audio inpainting, meaning users can apply AI tools to their own audio files.
Stable Audio 2.5 comes about 17 months after Stability AI introduced Stable Audio 2.0 in April 2024.
Stable Audio models bring different functionality to the enterprise compared with other popular generative AI models that focus on speech, text or image.
"We haven't seen a lot of music/audio models in the enterprise," according to Arun Chandrasekaran, an analyst at Gartner. "This is a very nascent and niche use case. There are not a lot of other providers that are doing that. In that sense, it's unique."
Bradley Shimmin, an analyst at Futurum Group, added that many model makers tended to be conservative in its focus of music generation because of the industry and the possibility of infringement, which could lead to lawsuits. But given Stable Audio 2.5, that appears to have changed.
"Any company that's doing business, whether you're a software manufacturer, you're building some application, you're sales and support or anything that touches other humans, sound is critical," he said.
Chandrasekaran said the model will likely thrive in industries such as design, marketing and communication, where teams could benefit from audio or music models.
Shimmin said the inpainting feature could also be useful in contact centers or help desk environments where audio is being incorporated as a voice assistant or in kiosks or sales devices.
"They're hitting on some interesting areas that are both currently underutilized, like the generation and composition of music, to more enterprise-class capabilities around the basics," he said.
Chandrasekaran also said it's likely that Stability AI is targeting SaaS applications that provide multimodal AI or audio production functionality.
"The model that Stability AI is creating will become the underlying model [for these applications]," he said. He added that this resembles how Microsoft Copilot uses OpenAI.
"From a developer perspective, or from a software engineering leader perspective, the fact that the OpenAI model is running underneath [Copilot] is beside the point," he continued. "I suspect that some of these companies may not always have a direct enterprise go-to-market. They might partner with other application providers that route these models to the enterprise."
Audio model makers should expect to face challenges if robust guardrails are not in place.
Currently, AI vendors such as OpenAI and Anthropic are facing copyright lawsuits for using protected data to train their models. Anthropic tried to settle a lawsuit brought by book publishers last week, but a federal judge later paused the settlement.
Shimmin said Stability AI is likely aware of copyright lawsuits and the caution needed when using training data given it partnered with a sound branding agency, Amp. Sound branding agencies like Amp help businesses develop their own unique sounds distinct to their brands.
"I can foresee companies like Stability and their partners making arrangements and partnerships with audio producers to ensure that they're compensated appropriately," Shimmin said. He added that it is also essential Stability customers using its models have some indemnification against potential lawsuits and transparency into the data used to train the model.
Stability AI said the model is commercially safe and trained on a fully licensed data set.
Stable Audio 2.5 is available through the Stability AI API and platforms such as Replicate, ComfyUI and Fal.
You May Also Like

商业视角看AI

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读