对不起,我要的是精准出图,而不是随机抽卡
在Gemini应用中获取最佳图像生成与编辑效果的技巧
概述
内容来源:https://blog.google/products/gemini/image-generation-prompting-tips/
Gemini现已在Gemini应用、AI Studio和Vertex AI中提供先进的图像生成与编辑功能。通过特定提示词可实现角色一致性、精准编辑和图像融合。在提示词中尝试使用主体、构图、动作、场景、风格和编辑指令以获得最佳效果。
摘要由Google AI生成。生成式AI处于实验阶段。
我们最新推出的Gemini图像生成与编辑更新包含角色一致性、对话式编辑等多项进阶功能。以下技巧助您充分发挥其潜力。
今日我们推出了先进的图像生成与编辑模型,可在Gemini应用、AI Studio和Vertex AI中使用。本次更新显著提升了角色一致性、精准对话式编辑以及将照片合成为全新作品的能力。为帮助您充分利用此更新,以下是为Gemini图像生成与编辑编写更有效提示词的技巧。
Gemini图像生成的核心能力
在深入使用前,了解Gemini的改进功能将帮助您探索更多应用场景:
- 角色设计一致性:在多次生成和编辑中保持角色或物体的外观特征
- 创意构图:将不同元素、主体和风格融合为统一图像
- 局部编辑:使用简单语言对图像特定部分进行精确修改
- 设计与外观适配:将某种风格、纹理或设计应用到其他概念中
- 逻辑推理:运用现实世界理解能力生成复杂场景或预测序列后续发展
构建有效提示词的六大要素
简单的一两句话输入即可在Gemini中获得不错的效果。但若要获得最佳效果并实现更精细的创作控制,建议在提示词中包含以下要素:
- 主体:图像中是谁或什么?需具体描述(例如:一个有着发光蓝色光学元件的坚毅机器人咖啡师;戴着小巫师帽的毛茸茸三花猫)
- 构图:画面如何构图?(例如:极端特写、广角镜头、低角度拍摄、肖像照)
- 动作:正在发生什么?(例如:冲泡咖啡、施展魔法、奔跑穿越田野的中间步伐)
- 场景:故事发生在何处?(例如:火星上的未来主义咖啡馆、杂乱炼金术师图书馆、金色时刻阳光普照的草地)
- 风格:整体美学风格?(例如:3D动画、黑色电影、水彩画、照片级真实感、1990年代产品摄影)
- 编辑指令:修改现有图像时需直接明确(例如:将男士领带改为绿色、移除背景中的汽车)
提示词示例:创意技巧展示
不同的提示策略可解锁从照片级编辑到奇幻新世界的各种创作。以下是五种可尝试的技巧及关键示例:
1. 保持角色外观一致性
Gemini能在不同姿势、光照和环境条件下保持人物或角色的相似度,甚至能将同一角色应用到新风格和场景中。以下示例展示同一会话中如何通过多个提示词使用同一角色:
- 提示词1:一个异想天开的插画,描绘发光的迷你蘑菇精灵。精灵头戴发光蘑菇帽,睁着好奇的大眼睛,身体由编织藤蔓组成
- 提示词2(同一会话中):现在展示同一精灵骑在友好的苔藓蜗牛背上,穿越阳光明媚的野花草地
通过在第一个提示词中建立具有具体细节的明确定义角色,您可以使用后续提示词将同一角色置于全新语境中。此处Gemini保留了角色的关键特征,如面部特征、独特外观和服装。
2. 实现精准定向变换
凭借更新的图像编辑功能,您可对照片进行快速精确的编辑。这非常适合从产品模型到完善个人照片的各种场景:
- 提示词1:现代极简客厅的高质量照片,含灰色沙发、浅木咖啡桌和大型盆栽
- 提示词2(编辑):将沙发颜色改为深海军蓝
- 提示词3(编辑):现在在咖啡桌上添加三本书叠放
这展示了Gemini在局部编辑方面的优势。通过直接对话式指令,您可修改图像中的特定元素,无需复杂软件或重新生成整个场景。
3. 创意构图融合概念
尝试将两个或多个创意融合为单张惊艳图像。提示Gemini创建两张图像,然后以富有想象力的方式结合其主体和环境:
- 提示词1:生成戴头盔穿全套宇航服的宇航员照片级图像
- 提示词2:雨林中杂草丛生的篮球场图片
- 提示词3(上传并合并):展示宇航员在此球场扣篮的画面
4. 适配与应用新风格
通过应用新风格、配色或纹理完全改变图像的情绪和美学风格,同时保持原始主体不变:
- 提示词1:城市街道停放经典摩托车的照片级图像
- 提示词2(编辑):对此图像应用建筑绘图风格
通过"风格迁移",Gemini理解核心主体(摩托车)及其形态,然后以要求的艺术风格完全重新渲染。这可应用于设计灵感、艺术探索等场景。
5. 运用逻辑推理进行复杂生成
给予Gemini简单概念,让其推理能力构建细节。这适用于需要理解现实世界关系或流程的内容创作:
- 提示词1:生成人物站立手持三层蛋糕的图像
- 提示词2(同一会话中):生成展示如果他们绊倒会发生什么的图像
模型可运用逻辑推理能力预测后续发展。它理解第一张图像的语境和物理原理——人物小心平衡蛋糕——然后模拟如绊倒等动作的合理后果,生成动态且具有语境感知的新图像。
当前限制说明
随着我们持续开发和微调模型,以下方面仍需改进:
- 风格化:虽然功能强大,但模型的风格化有时可能不一致或产生意外结果
- 文字渲染:模型偶尔可能拼错单词或难以处理复杂排版
- 角色特征:虽然模型擅长角色一致性,但并非总能完美实现。我们正在努力使这种一致性更加可靠
- 设置与保持宽高比:模型在保持宽高比方面存在困难——虽然可以提示所需尺寸,但输出可能不总是符合要求
我们正积极改进这些领域,并感谢您在我们共同构建下一代图像工具过程中展现的创造力。
创意可能性正待您采撷——我们迫不及待想看到您的创作!
特别感谢Greenfield团队高级生成工程师的创意贡献。
General summary
Gemini now offers state-of-the-art image generation and editing within the Gemini app, AI Studio, and Vertex AI. You can use specific prompts to achieve consistent characters, precise edits, and blended images. Try using subject, composition, action, location, style, and editing instructions in your prompts to get the best results.
Summaries were generated by Google AI. Generative AI is experimental.
Our latest update to image generation and editing in Gemini includes advancements in character consistency, conversational editing and more. Here are some tips to get the most out of it.
Earlier today, we launched a state-of-the-art image generation and editing model, available in the Gemini app, AI Studio and Vertex AI. This update introduces significant advancements in character consistency; precise, conversational editing; and the ability to combine photos into a completely new creation. To help you get the most out of this update, here are some tips for writing more effective prompts for image generation and editing in Gemini.
Key capabilities of image generation in Gemini
Before you dive in, it’s helpful to familiarize yourself with what’s been improved in Gemini, so you can consider which use cases to try with it:
- Consistent character design. Preserve a character or object's appearance across multiple generations and edits.
- Creative composition. Blend disparate elements, subjects and styles from multiple concepts into a single, unified image.
- Local edits. Make precise edits to specific parts of an image using simple language.
- Design and appearance adaptation. Apply a style, texture or design from one concept to another.
- Logic and reasoning. Use real-world understanding to generate complex scenes or predict the next step in a sequence.
6 elements of constructing effective prompts
You can get great results with Gemini from simple one or two-sentence inputs. However, to achieve the best results and unlock more nuanced creative control, consider including the following elements in your prompt:
- Subject: Who or what is in the image? Be specific. (e.g., a stoic robot barista with glowing blue optics; a fluffy calico cat wearing a tiny wizard hat).
- Composition: How is the shot framed? (e.g., extreme close-up, wide shot, low angle shot, portrait).
- Action: What is happening? (e.g., brewing a cup of coffee, casting a magical spell, mid-stride running through a field).
- Location: Where does the scene take place? (e.g., a futuristic cafe on Mars, a cluttered alchemist's library, a sun-drenched meadow at golden hour).
- Style: What is the overall aesthetic? (e.g., 3D animation, film noir, watercolor painting, photorealistic, 1990s product photography).
- Editing Instructions: For modifying an existing image, be direct and specific. (e.g., change the man's tie to green, remove the car in the background).
Prompting examples: A showcase of creative techniques
Different prompting strategies can unlock everything from photorealistic edits to fantastical new worlds. Here are five techniques to try, each with a key example.
1. Preserve characters’ appearances.
Gemini can maintain the likeness of a person or character across different poses, lighting and environments, and even apply the same character to new styles and surfaces. Here’s an example of how one character can be used across multiple prompts in the same session:
- Prompt 1: A whimsical illustration of a tiny, glowing mushroom sprite. The sprite has a large, bioluminescent mushroom cap for a hat, wide, curious eyes, and a body made of woven vines.
- Prompt 2 (in the same conversation): Now, show the same sprite riding on the back of a friendly, moss-covered snail through a sunny meadow full of colorful wildflowers.
By establishing a clearly defined character with specific details in the first prompt, you can use follow-up prompts to place that same character in entirely new contexts. Here, Gemini preserves key features of the character like facial features, distinctive appearance and clothing.
2. Make targeted transformations with precision.
With updated image editing capabilities, you can make quick, highly precise edits to your photos. This is perfect for everything from product mockups to perfecting personal pictures. Here’s an example:
- Prompt 1: A high-quality photo of a modern, minimalist living room with a grey sofa, a light wood coffee table, and a large potted plant.
- Prompt 2 (editing): Change the sofa's color to a deep navy blue.
- Prompt 3 (editing): Now, add a stack of three books to the coffee table.
This showcases Gemini’s strength in local edits. By using direct, conversational commands, you can modify specific elements within the image without needing complex software or re-generating the entire scene.
3. Blend concepts with creative composition.
Try fusing two or more ideas into a single, striking image. Prompt Gemini to create two images, and then combine their subjects and environments in imaginative ways:
- Prompt 1: Generate a photorealistic picture of an astronaut in a helmet and full suit.
- Prompt 2: A picture of an overgrown basketball court in the rainforest.
- Prompt 3 (upload both and combine): Show the astronaut dunking a basketball in this court.
4. Adapt and apply new styles.
Completely change the mood and aesthetic of an image by applying a new style, color palette or texture, all while keeping the original subject intact.
- Prompt 1: A photorealistic image of a classic motorcycle parked on a city street.
- Prompt 2 (editing): Apply the style of an architectural drawing to this image.
With “style transfer,” Gemini understands the core subject (the motorcycle) and its form, then re-renders it entirely in the requested artistic style. This can be used for design inspiration, artistic exploration, and more.
5. Use logic and reasoning for complex generation.
Give Gemini a simple concept and let its reasoning capabilities build out the details. This is useful for creating content that requires an understanding of real-world relationships or processes.
- Prompt 1: Generate an image of a person standing holding a 3 tiered cake.
- Prompt 2 (in the same session): Generate an image showing what would happen if they tripped.
The model can use its logic and reasoning capabilities to predict what comes next. It understands the context and physics of the first image — a person carefully balancing a cake — and can then simulate the plausible consequences of an action like tripping, resulting in a dynamic and context-aware new image.
A note on current limitations
As we continue to develop and fine-tune our models, there are still areas in need of improvement:
- Stylization: While powerful, the model’s stylization can sometimes be inconsistent or produce unexpected results.
- Text rendering: The model may occasionally misspell words or struggle with complex typography.
- Character features: While the model excels at character consistency, it may not always get it right. We're working to make this consistency even more reliable.
- Setting and maintaining aspect ratios: The model struggles with maintaining aspect ratios — while you can prompt for desired dimensions, the output may not always support your requests.
We’re actively working to improve these areas and appreciate your creativity as we build the next generation of image tools together.
The creative possibilities are ripe for your picking — we can’t wait to see what you come up with!
With special thanks to the Greenfield team of senior staff generative engineers for their creative contributions.