苹果如何提升AI图像编辑工具

内容来源:https://lifehacker.com/tech/apple-improving-ai-image-editors?utm_medium=RSS
内容总结:
尽管苹果在人工智能领域的公开竞争中看似落后于OpenAI、谷歌等公司,但其研究团队正通过基础研究持续推动技术发展。上周,苹果研究人员发布了名为“Pico-Banana-400K”的开源图像数据集,该数据集包含40万张经过文本标注的图片,旨在提升AI图像编辑模型的性能。
与现有数据集相比,该数据集强调图片质量的提升与内容的多样性。值得注意的是,这项研究采用了谷歌的Nano Banana图像编辑模型进行测试,可实现对图片的35种编辑操作,并利用Gemini-2.5-Pro评估编辑质量。数据集包含25.8万组单次编辑样本、5.6万组优劣效果对比样本及7.2万组连续编辑序列。
测试显示,不同编辑功能的成功率差异显著:艺术风格转换成功率高达93%,而修改文字字体或颜色的成功率仅为58%。其他功能如添加文字(67%)、画面放大(74%)和添加复古滤镜(91%)也呈现不同表现。
与苹果惯有的封闭生态不同,此次发布的数据集将向所有研究人员和开发者开放。这一举措表明,虽然苹果在消费级AI产品层面尚未崭露头角,但其正通过开放科研的方式积极参与人工智能技术的基础建设。
中文翻译:
在人工智能竞赛中,苹果或许堪称垫底——至少与OpenAI、谷歌和Meta等公司相比时——但这并不意味着这家科技巨头没有在潜心钻研。事实上,苹果的AI研究大多隐藏在聚光灯之外:当“苹果智能”体系初具雏形时,该公司的研究人员正在为全行业(而不仅是苹果用户)提升AI模型能力开辟新路径。最新成果是什么?正是基于文本指令优化的图像编辑器。
在上周发表的论文中,研究人员推出了包含40万张“文本引导”图像的Pico-Banana-400K数据集,旨在增强AI图像编辑性能。苹果认为该数据集通过收录更高质量、更多样化的图像实现了突破:研究者发现现有数据集要么使用AI生成图片,要么多样性不足,这些缺陷都会制约模型优化进程。
有趣的是,Pico-Banana-400K的设计可与谷歌图像编辑模型Nano Banana协同工作。研究人员表示,借助该数据集能生成35种不同类型的编辑效果,同时调用Gemini-2.5-Pro评估编辑质量,判断是否将其纳入整体数据集。
这40万张图像包含25.8万组单次编辑样本(对比原始图与编辑图)、5.6万组区分成功与失败编辑的“偏好配对”,以及7.2组记录2至5次编辑步骤的“多轮序列”。
研究显示不同编辑功能的成功率存在差异:全局编辑与风格化属“简易”范畴,成功率最高;物体语义与场景语境为“中等”难度;而精确几何结构、版面布局与文字排版则属“困难”级别。表现最佳的“强力艺术风格转换”(如将图像转为梵高风格或动漫风格)成功率达93%,而表现最弱的“修改可见文本字体或颜色”成功率仅58%。其他测试功能包括“添加新文本”(67%)、“画面放大”(74%)和“添加胶片颗粒或复古滤镜”(91%)。
与苹果多数产品封闭于自有生态的传统不同,Pico-Banana-400K已向所有研究者与AI开发者开放。在苹果本就落后的领域,能看到其研究人员为开放研究做出贡献实属难得。我们是否很快能迎来AI驱动的Siri?答案尚不明确。但可以确定的是,苹果正以独特的方式积极布局人工智能。
英文来源:
Apple might be dead last in the AI race—at least when you consider competition from companies like OpenAI, Google, and Meta—but that doesn't mean the company isn't working on the tech. In fact, it seems most of the work Apple does on AI is behind the scenes: While Apple Intelligence is, well, there, the company's researchers are working on other ways to improve AI models for everyone, not just Apple users. The latest project? Improving AI image editors based on text prompts.
In a paper published last week, researchers introduced Pico-Banana-400K, a dataset of 400,000 "text-guided" images selected to improve AI-based image editing. Apple believes its image dataset improves upon existing sets by including higher quality images with more diversity: The researchers found that existing datasets either use images produced by AI models, or are not varied enough, which can hinder efforts to improve the models.
Funnily enough, Pico-Banana-400K is designed to work with Nano Banana, Google's image editing model. Researchers say using Nano Banana, their dataset can generate 35 different types of edits, as well as tap into Gemini-2.5-Pro to asses quality the edits, and whether those edits should remain as part of the overall dataset.
As part of these 400,000 images, there are 258,000 samples of single edits (where Apple compares the original images to one with edits); 56,000 "preference pairs," which distinguishes between failed and successful edit generations; and 72,000 "multi-turn sequences," which walks through two to five edits.
Researchers note that different functions had different success rates in this dataset. Global edits and stylization are "easy," achieving the highest success rates; object semantics and scene context are "moderate;" while precise geometry, layout, and typography are "hard." The highest performing function, "strong artistic style transfer," which could include changing an image's style to "Van Gogh" or anime, has a 93% success rate. The lowest performing function, "change font style or color of visible text if there is text," only succeeded 58% of the time. Other tested functions include "add new text" (67% success rate), "zoom in" (74% success rate), and "add film grain or vintage filter" (91% success rate).
Unlike many of Apple's products, which are typically closed to the company's own platforms, Pico-Banana-400K is open for all researchers and AI developers to use. It's cool to see Apple researchers contributing to open research like this, especially in an area Apple is generally behind in. Will we actually get an AI-powered Siri anytime soon? Unclear. But it is clear Apple is actively working on AI, perhaps just in its own way.