谷歌官方手把手教学,如何用好nano-banana生成最佳图片?
如何通过提示词获得 Gemini 2.5 Flash 图像生成的最佳效果
2025年8月28日
Gemini 2.5 Flash Image 是我们最新、最快、最高效的原生多模态模型。Gemini 2.5 Flash 的独特之处在于其原生多模态架构。它从零开始训练,能够在一个统一的步骤中处理文本和图像。这使其具备了超越简单图像生成的强大能力,例如对话式编辑、多图像组合以及对图像内容进行逻辑推理。
以下是您可以实现的主要功能:
- 文生图: 根据简单或复杂的文本描述生成高质量图像。
- 图+文生图(编辑): 提供一张图像并使用文本提示添加、移除或修改元素、更改风格或调整颜色。
- 多图生图(组合与风格迁移): 使用多张输入图像组合新场景或将一张图像的风格迁移到另一张。
- 迭代优化: 通过多轮对话逐步优化您的图像,进行细微调整。
- 文本渲染: 生成包含清晰且位置得当的文本的图像,非常适合制作徽标、图表和海报。
本指南将教您如何编写提示词和提供指令,以便从 Gemini 2.5 Flash 获得更好的结果。一切始于一个基本原则:
描述场景,而不仅仅是罗列关键词。 该模型的核心优势在于其深厚的语言理解能力。一段叙述性、描述性的段落几乎总是比一系列简单、不连贯的词语产生更好、更连贯的图像。
您可以使用官方文档中的代码尝试这些功能,或立即在 Google AI Studio 中开始创作。
国内免翻墙访问方式,欢迎联系获取,微信:qimugood
从文本创建图像
生成图像最常见的方式是描述您想要看到的内容。
1. 照片级真实感场景
对于写实图像,请像摄影师一样思考。提及相机角度、镜头类型、光线和精细细节将引导模型生成照片级真实感的结果。
模板:
一张照片级真实感的 [拍摄类型],主体是 [主体],[动作或表情],背景设定在 [环境]。场景由 [光线描述] 照明,营造出 [氛围] 的气氛。使用 [相机/镜头细节] 拍摄,强调 [关键纹理和细节]。图像应为 [宽高比] 格式。
示例提示词:
一张照片级真实感的特写肖像,描绘一位年长的日本陶艺家,他有着深邃、饱经风霜的皱纹和温暖、会心的微笑。他正在仔细检查一个刚上完釉的茶碗。场景设置在他质朴、阳光普照的工作室里。柔和的黄金时刻光线从窗户流泻而入,照亮了陶土的精细纹理。使用 85mm 人像镜头拍摄,产生柔和、模糊的背景(散景)。整体氛围宁静而精湛。竖版人像方向。
示例输出:
一张照片级真实感的特写肖像,描绘一位年长的日本陶艺家...
2. 风格化插画与贴纸
要创建贴纸、图标或项目素材,请明确说明风格,如果需要,记得要求白色背景。
模板:
一个 [风格] 风格的贴纸,主体是 [主体],具有 [关键特征] 和 [配色方案]。设计应具有 [线条风格] 和 [阴影风格]。背景必须为白色。
示例提示词:
一个可爱风格的贴纸,主体是一只开心的红熊猫,戴着一顶小小的竹帽。它正在啃一根绿色的竹叶。设计具有粗壮、清晰的轮廓、简单的卡通渲染阴影和鲜艳的配色方案。背景必须为白色。
示例输出:
一个可爱风格的贴纸,主体是一只开心的红熊猫...
3. 图像中准确的文本
Gemini 2.5 Flash Image 可以在图像中渲染文本。请清楚说明您想要的确切文本,描述字体样式,并设定整体设计。
模板:
为 [品牌/概念] 创建一个 [图像类型],文本为“[要渲染的文本]”,使用 [字体风格]。设计应为 [风格描述],采用 [配色方案]。
示例提示词:
为一家名为 'The Daily Grind' 的咖啡店创建一个现代、极简主义的徽标。文本应使用干净、粗体、无衬线字体。设计应包含一个简单的、风格化的咖啡豆图标,与文本无缝集成。配色方案为黑白。
示例输出:
为一家名为 'The Daily Grind' 的咖啡店创建一个现代、极简主义的徽标...
4. 产品模型与商业摄影
为电子商务、广告或品牌创建干净、专业的产品照片。
模板:
一张高分辨率、影棚灯光的产品照片,主体是 [产品描述],置于 [背景表面/描述] 上。灯光采用 [灯光设置,例如:三点柔光箱设置] 以达到 [灯光目的]。相机角度为 [角度类型] 以展示 [特定功能]。超写实风格,对 [关键细节] 有清晰对焦。[宽高比]。
示例提示词:
一张高分辨率、影棚灯光的產品照片,主体是一个哑光黑色的极简主义陶瓷咖啡杯,摆放在抛光混凝土地面上。灯光采用三点柔光箱设置,旨在产生柔和、漫射的高光并消除 harsh shadows (生硬阴影)。相机角度为略微抬高的 45 度拍摄,以展示其简洁的线条。超写实风格,对咖啡上升的热气有清晰对焦。方形图像。
示例输出:
一张高分辨率、影棚灯光的产品照片,主体是一个哑光黑色的极简主义陶瓷咖啡杯...
5. 极简主义与留白设计
创建用于网站、演示文稿或营销材料的背景,并计划在上面叠加文本。
模板:
一个极简主义构图,在画面的 [右下角/左上角/等] 放置一个单一的 [主体]。背景是广阔的、空旷的 [颜色] 画布,创造出显著的留白空间。柔和、微妙的光线。[宽高比]。
示例提示词:
一个极简主义构图,在画面的右下角放置一片精致的红色枫叶。背景是广阔的、空旷的米白色画布,为文本创造出显著的留白空间。来自左上方的柔和、漫射光线。方形图像。
示例输出:
一个极简主义构图,在画面的右下角放置一片精致的红色枫叶...
6. 序列艺术(漫画面板 / 故事板)
通过清晰的场景描述,逐格创建引人入胜的视觉叙事,非常适合开发故事板、连环画或任何形式的序列艺术。
模板:
一个单一漫画书面板,采用 [艺术风格] 风格。前景中,[角色描述和动作]。背景中,[场景细节]。面板有一个 [对话/字幕框],文本为“[文本]”。光线营造出 [氛围] 的情绪。[宽高比]。
示例提示词:
一个单一漫画书面板,采用粗犷、黑色电影艺术风格,具有高对比度的黑白墨水。前景中,一名穿着风衣的侦探站在闪烁的街灯下,雨水打湿了他的肩膀。背景中,一个荒凉酒吧的霓虹招牌倒映在水坑中。顶部的一个字幕框写着“这座城市是个难以保守秘密的 tough place (艰难之地)。” 光线 harsh (生硬),营造出戏剧性的、阴郁的情绪。横向。
示例输出:
一个单一漫画书面板,采用粗犷、黑色电影艺术风格...
使用文本编辑图像
这是 Gemini 2.5 Flash Image 多模态能力真正大放异彩的地方。您可以在文本提示旁边提供一张或多张图像进行编辑、组合和风格迁移。
1. 图像编辑:添加和移除元素
提供一张图像并简单描述您想要的更改。模型将分析原始图像的风格、光线和视角,使编辑看起来自然,并在系列图像中保持角色一致性。
模板:
使用提供的 [主体] 图像,请 [添加/移除/修改] 场景中的 [元素]。确保更改 [描述更改应如何融入]。
示例提示词:
使用提供的我的猫的图像,请在它的头上添加一顶小小的针织巫师帽。让它看起来像是舒适地戴着,并且与照片的柔和光线相匹配。
示例输入和输出:
2. 局部重绘:编辑特定区域
您可以对话式地告诉 Gemini 2.5 Flash Image 只编辑图像的一部分,同时保持其余部分完全不变。
模板:
使用提供的图像,仅将 [特定元素] 更改为 [新元素/描述]。保持图像中所有其他部分完全不变,保留原始风格、光线和构图。
示例提示词:
使用提供的客厅图像,仅将蓝色沙发更改为一张复古的棕色皮革切斯特菲尔德沙发。保持房间的其余部分不变,包括沙发上的枕头和光线。
示例输入和输出:
3. 风格迁移
提供一张照片,并要求模型以特定的风格或艺术流派重新创作其内容。
模板:
将提供的 [主体] 照片转换为 [艺术家/艺术风格] 的艺术风格。保留原始构图,但使用 [风格元素描述] 进行渲染。
示例提示词:
将提供的现代城市街道夜景照片转换为文森特·梵高的《星月夜》的艺术风格。保留建筑物和汽车的原始构图,但使用漩涡状的、厚涂的笔触以及深蓝色和亮黄色的戏剧性调色板渲染所有元素。
示例输入和输出:
4. 高级组合:合并多张图像
提供多张图像作为上下文,以创建全新的合成场景。这非常适合产品模型或创意拼贴。
模板:
通过组合提供图像中的元素创建一张新图像。将 [图像 1 中的元素] 与 [图像 2 中的元素] 放置在一起/之上。最终图像应为 [最终场景描述]。
示例提示词:
创建一张专业的电子商务时尚照片。取第一张图像中的蓝色碎花连衣裙,让第二张图像中的女士穿上它。生成一张该女士穿着连衣裙的逼真全身照,并调整光线和阴影以匹配户外环境。
示例输入和输出:
最佳实践
在您创作时,这里有一些使用图像生成的更多技巧:
- 力求极度具体: 您提供的细节越多,控制力就越强。不要说“奇幻盔甲”,而是描述它:“华丽的精灵板甲,蚀刻有银叶图案,带有高领和形状像猎鹰翅膀的肩甲。”
- 修复角色一致性漂移: 如果您发现在多次迭代编辑后角色特征开始漂移,可以重新开始一个新的对话,并提供详细描述以保持一致性。
- 提供上下文和意图: 解释图像的目的。例如,“为高端、极简主义护肤品牌创建一个徽标”会比仅仅“创建一个徽标”产生更好的结果。
- 迭代和优化: 不要期望第一次尝试就得到完美的图像。利用模型的对话性质进行小幅修改。使用诸如“很好,但能让光线再暖一点吗?”或“保持其他一切不变,但将角色的表情改为更严肃”之类的提示词进行后续操作。
- 使用“语义负面提示”: 不要说“不要汽车”,而是正面描述期望的场景:“一条空旷、荒凉的街道,没有交通迹象。”
- 宽高比: 编辑时,Gemini 2.5 Flash Image 通常会保留输入图像的宽高比。如果没有,请在提示词中明确说明:
“更新输入图像...不要更改输入宽高比。”
如果您上传了多张不同宽高比的图像,模型将采用最后提供的图像的宽高比。如果您需要为新图像指定特定的比例,而提示词无法产生该比例,最佳实践是在提示词中提供一张具有正确尺寸的参考图像。 - 控制相机: 使用摄影和电影语言来控制构图。诸如
广角镜头
、微距拍摄
、低角度视角
、85mm 人像镜头
和荷兰角
等术语可以让您精确控制最终图像。
局限性
在我们持续开发和改进模型的同时,我们相信对需要改进的领域保持透明至关重要。
虽然 Gemini 2.5 Flash Image 是一个强大且多功能的工具,但对于高度细微的请求,第一次尝试就达到完美可能需要一些迭代。您可能会发现,生成复杂的版式或在多张图像中保持角色特征的绝对一致性有时需要通过后续提示词进行优化。
我们正在积极改进这些领域,并感谢您的创造力,让我们共同构建下一代图像工具。
现在您已经掌握了基础技能,可以帮助您使用 Gemini 2.5 Flash 创建和编辑出色的图像。最好的提升方法就是练习。以下是一些资源,可助您一臂之力:
- 在 Google AI Studio 中探索 Gemini:使用我们的基于网络的工具开始试验本指南中的技巧是最简单的方法。
- 阅读官方文档:适用于希望将 Gemini 2.5 Flash 的图像生成功能集成到自己应用程序中的开发者。
- 查看定价:了解在项目中使用 Gemini API 的 Gemini 2.5 Flash 图像生成功能的相关成本。
- 尝试图像编辑小程序:测试 AI 驱动的照片编辑,应用创意滤镜,或使用简单的文本提示进行专业调整。
英文原文
AUG. 28, 2025
Gemini 2.5 Flash Image is our latest, fastest, and most efficient natively multimodal model. What makes Gemini 2.5 Flash unique is its native multimodal architecture. It was trained from the ground up to process text and images in a single, unified step. This allows for powerful capabilities beyond simple image generation, such as conversational editing, multi-image composition, and logical reasoning about image content.
Here are the key things you can do:
- Text-to-image: Generate high-quality images from simple or complex text descriptions.
- Image + text-to-image (editing): Provide an image and use text prompts to add, remove, or modify elements, change the style, or adjust colors.
- Multi-image to image (composition & style transfer): Use multiple input images to compose a new scene or transfer the style from one image to another.
- Iterative refinement: Have a conversation to progressively refine your image over multiple turns, making small adjustments.
- Text rendering: Generate images that contain clear and well-placed text, ideal for logos, diagrams, and posters.
This guide will teach you how to write prompts and provide instructions that get better results from Gemini 2.5 Flash. It all starts with one fundamental principle:
Describe the scene, don't just list keywords. The model's core strength is its deep language understanding. A narrative, descriptive paragraph will almost always produce a better, more coherent image than a simple list of disconnected words.
You can try these with code from the official documentation or start creating right away in Google AI Studio.
Creating images from text
The most common way to generate an image is by describing what you want to see.
1. Photorealistic scenes
For realistic images, think like a photographer. Mentioning camera angles, lens types, lighting, and fine details will guide the model toward a photorealistic result.
Template:
A photorealistic [shot type] of [subject], [action or expression], set in [environment]. The scene is illuminated by [lighting description], creating a [mood] atmosphere. Captured with a [camera/lens details], emphasizing [key textures and details]. The image should be in a [aspect ratio] format.
Example prompt:
A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful. Vertical portrait orientation.
Example output:
A photorealistic close-up portrait of an elderly Japanese ceramicist...
2. Stylized illustrations & stickers
To create stickers, icons, or assets for your projects, be explicit about the style and remember to request a white background if you need one.
Template:
A [style] sticker of a [subject], featuring [key characteristics] and a [color palette]. The design should have [line style] and [shading style]. The background must be white.
Example prompt:
A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. It's munching on a green bamboo leaf. The design features bold, clean outlines, simple cel-shading, and a vibrant color palette. The background must be white.
Example output:
A kawaii-style sticker of a happy red panda...
3. Accurate text in images
Gemini 2.5 Flash Image can render text within images. Be clear about the exact text you want, describe the font style, and set the overall design.
Template:
Create a [image type] for [brand/concept] with the text "[text to render]" in a [font style]. The design should be [style description], with a [color scheme].
Example prompt:
Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a coffee bean seamlessly integrated with the text. The color scheme is black and white.
Example output:
Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'...
4. Product mockups & commercial photography
Create clean, professional product shots for e-commerce, advertising, or branding.
Template:
A high-resolution, studio-lit product photograph of a [product description] on a [background surface/description]. The lighting is a [lighting setup, e.g., three-point softbox setup] to [lighting purpose]. The camera angle is a [angle type] to showcase [specific feature]. Ultra-realistic, with sharp focus on [key detail]. [Aspect ratio].
Example prompt:
A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black, presented on a polished concrete surface. The lighting is a three-point softbox setup designed to create soft, diffused highlights and eliminate harsh shadows. The camera angle is a slightly elevated 45-degree shot to showcase its clean lines. Ultra-realistic, with sharp focus on the steam rising from the coffee. Square image.
Example output:
A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug...
5. Minimalist & negative space design
Create backgrounds for websites, presentations, or marketing materials where you plan to overlay text.
Template:
A minimalist composition featuring a single [subject] positioned in the [bottom-right/top-left/etc.] of the frame. The background is a vast, empty [color] canvas, creating significant negative space. Soft, subtle lighting. [Aspect ratio].
Example prompt:
A minimalist composition featuring a single, delicate red maple leaf positioned in the bottom-right of the frame. The background is a vast, empty off-white canvas, creating significant negative space for text. Soft, diffused lighting from the top left. Square image.
Example output:
A minimalist composition featuring a single, delicate red maple leaf...
6. Sequential art (comic panel / storyboard)
Create compelling visual narratives, panel by panel, ideal for developing storyboards, comic strips, or any form of sequential art by focusing on clear scene descriptions.
Template:
A single comic book panel in a [art style] style. In the foreground, [character description and action]. In the background, [setting details]. The panel has a [dialogue/caption box] with the text "[Text]". The lighting creates a [mood] mood. [Aspect ratio].
Example prompt:
A single comic book panel in a gritty, noir art style with high-contrast black and white inks. In the foreground, a detective in a trench coat stands under a flickering streetlamp, rain soaking his shoulders. In the background, the neon sign of a desolate bar reflects in a puddle. A caption box at the top reads "The city was a tough place to keep secrets." The lighting is harsh, creating a dramatic, somber mood. Landscape.
Example output:
A single comic book panel in a gritty, noir art style...
Editing images with text
This is where Gemini 2.5 Flash Image multimodality truly shines. You can provide one or more images alongside your text prompts for editing, composition, and style transfer.
1. Image editing: Adding & removing elements
Provide an image and simply describe the change you want. The model will analyze the original image's style, lighting, and perspective to make the edit look natural and maintain character consistency across a series of images.
Template:
Using the provided image of [subject], please [add/remove/modify] [element] to/from the scene. Ensure the change is [description of how the change should integrate].
Example prompt:
Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and matches the soft lighting of the photo.
Example input & output:
2. Inpainting: editing a specific area
You can conversationally tell Gemini 2.5 Flash Image to edit only one part of an image while leaving the rest completely untouched.
Template:
Using the provided image, change only the [specific element] to [new element/description]. Keep everything else in the image exactly the same, preserving the original style, lighting, and composition.
Example prompt:
Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa. Keep the rest of the room, including the pillows on the sofa and the lighting, unchanged.
Example input & output:
3. Style transfer
Provide a photo and ask the model to recreate its content in the specific style or art movement.
Template:
Transform the provided photograph of [subject] into the artistic style of [artist/art style]. Preserve the original composition but render it with [description of stylistic elements].
Example prompt:
Transform the provided photograph of a modern city street at night into the artistic style of Vincent van Gogh's 'Starry Night'. Preserve the original composition of buildings and cars, but render all elements with swirling, impasto brushstrokes and a dramatic palette of deep blues and bright yellows.
Example input & output:
4. Advanced composition: Combining multiple images
Provide multiple images as context to create a brand new, composite scene. This is perfect for product mockups or creative collages.
Template:
Create a new image by combining the elements from the provided images. Take the [element from image 1] and place it with/on the [element from image 2]. The final image should be a [description of the final scene].
Example prompt:
Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match an outdoor environment.
Example input & output:
Best practices
As you build, here are a more tips for working with image generation:
- Be hyper-specific: The more detail you provide, the more control you have. Instead of "fantasy armor," describe it: "ornate elven plate armor, etched with silver leaf patterns, with a high collar and pauldrons shaped like falcon wings."
- Fix character consistency drifts: If you notice a character's features begin to drift after many iterative edits, you can restart a new conversation with a detailed description to retain consistency.
- Provide context and intent: Explain the purpose of the image. For example, "Create a logo for a high-end, minimalist skincare brand" will yield better results than just "Create a logo."
- Iterate and refine: Don't expect a perfect image on the first try. Use the conversational nature of the model to make small changes. Follow up with prompts like, "That's great, but can you make the lighting a bit warmer?" or "Keep everything the same, but change the character's expression to be more serious."
- Use "semantic negative prompts": Instead of saying "no cars," describe the desired scene positively: "an empty, deserted street with no signs of traffic."
- Aspect ratios: When editing, Gemini 2.5 Flash Image generally preserves the input image's aspect ratio. If it doesn't, be explicit in your prompt:
"Update the input image... Do not change the input aspect ratio."
If you upload multiple images with different aspect ratios, the model will adopt the aspect ratio of the last image provided. If you need a specific ratio for a new image and prompting doesn't produce it, the best practice is to provide a reference image with the correct dimensions as part of your prompt. - Control the camera: Use photographic and cinematic language to control the composition. Terms like
wide-angle shot
,macro shot
,low-angle perspective
,85mm portrait lens
, andDutch angle
give you precise control over the final image.
Limitations
As we continue to develop and improve our models, we believe in being transparent about areas for improvement.
While Gemini 2.5 Flash Image is a powerful and versatile tool, achieving perfection on the first attempt with highly nuanced requests can require some iteration. You may find that generating complex typography or maintaining absolute consistency of character features across multiple images sometimes needs refinement through follow-up prompts.
We are actively working to improve these areas and appreciate your creativity as we build the next generation of image tools together.
You now have the foundational skills to help you create and edit incredible images with Gemini 2.5 Flash. The best way to improve is to practice. Here are some resources to help you on your journey:
- Explore Gemini in Google AI Studio: The easiest way to start experimenting with the techniques in this guide is with our web-based tool.
- Read the official documentation: For developers who want to integrate Gemini 2.5 Flash's image generation capabilities into their own applications.
- Review pricing: Understand the costs associated with using Gemini 2.5 Flash Image generation with the Gemini API for your projects.
- Try the Image Editing Applet: Test AI-Powered photo editing, apply creative filters, or make professional adjustments using simple text prompts.
文章标题:谷歌官方手把手教学,如何用好nano-banana生成最佳图片?
文章链接:https://qimuai.cn/?post=467
本站文章均为原创,未经授权请勿用于任何商业用途