Index
AlternateInitConfig(message)BackgroundSwapProcessingConfig(message)ControlNetConfig(message)ControlNetConfig.ControlNetConditionConfig(message)EditConfig(message)EditConfig.BufferZone(message)EditConfigV6(message)EditConfigV6.BufferZone(message)EditMode(enum)ExpansionConfig(message)GenSelfieConfig(message)ImageOutputOptions(message)MaskMode(message)OutpaintingProcessingConfig(message)OutputOptions(message)SemanticFilterConfig(message)TextEmbeddingPredictionParams(message)UpscaleConfig(message)VideoGenerationModelParams(message)VirtualTryOnModelParams(message)VisionEmbeddingModelParams(message)VisionGenerativeModelParams(message)VisionReasoningModelParams(message)
AlternateInitConfig
| Fields | |
|---|---|
enabled |
Whether to use AlternateInitConfig |
max_inpainting_mask_area |
Maximum inpainting area below which to consider using AlternateInitConfig |
BackgroundSwapProcessingConfig
BackgroundSwapConfig for imagen-3.0-capability-001
| Fields | |
|---|---|
blending_mode |
The blending mode for background swap. The values can be one of: * alpha-blending |
blending_factor |
The blending factor for background swap blending. Valid range: [0, 1]. Default value: 0 |
ControlNetConfig
| Fields | |
|---|---|
enable_control_net |
|
conditions[] |
Configurations for each condition. |
original_image_weight |
The weight for the original image. Valid range: [0, 1]. When set to 1.0, the output basically copies the input image. When set to 0.0, the output not respect the input image at all. |
ControlNetConditionConfig
| Fields | |
|---|---|
condition_name |
Currently supported conditions: * cannyEdges * depth |
condition_map_bytes_base64_encoded |
When the condition map is provided by the user, we will not compute the condition map on our side. |
condition_weight |
The guidance weight for the condition signal. Valid range: [0, 1]. The higher the weight, the model respects the ControlNet condition more. The default value is 1.0 if unspecified. |
condition_max_t |
The strength of the ControlNet's effect on each diffusion step. Valid range: [0, 1]. |
EditConfig
| Fields | |
|---|---|
buffer_zones[] |
Buffer zone, if provided, must be length 2. |
base_guidance_scale[] |
Guidance scale: this controls strength of text guidance. If provided, must be a list of 4 integers representing values during 4 stages of diffusion [fine-grained,...,...,coarse]. |
enable_clamping |
Whether to enable clamping mode, which: * Enables the rest of the configurations in EditConfig. * Better preserves unmasked area * Skips model internal dilation so client can fully control this |
base_steps |
Number of sampling steps. |
base_gamma |
Gamma: influences how much noise is added during sampling. |
sr1_steps |
Number of sampling steps for sr1 stage. |
sr2_steps |
Number of sampling steps for sr2 stage. |
semantic_filter_config |
NOTE: for experiment use, not production ready. Semantic Filter Config. This config reduces object hallucination on inpainted images. Users can set filter classes and filter entities to filter out generated images that hallucinate undesired objects in the inpainted area. This config is only enabled in Editing config. |
experiment_use_servo_backend |
Experiment flag to use servo backend. |
edit_mode |
The editing mode that describes the use case for editing. The values can be one of: * inpainting-remove * inpainting-insert * outpainting |
alternate_init_config |
Parameters for AlternateInitConfig |
experimental_sr_version |
Experimental flag for sr version. |
experimental_base_version |
Experimental flag for base version. |
embedding_scale |
Parameter to control embedding scale, range: [0, 1], default: 0.6. |
enable_border_replicate_padding |
Parameter to enable recompute with BORDER_REPLICATE mode for outpainting image padding. |
enable_post_processing_blend |
Parameter to enable post-processing blending for masked editing. |
outpainting_config |
Outpainting processing config. |
bgswap_config |
Background swap processing config. |
BufferZone
| Fields | |
|---|---|
pixels |
The number of pixels for the mask to dilate. |
diffusion_t |
When during diffusion this pixel dilation takes effect, 1=start, 0=end. |
EditConfigV6
EditConfig for imagegeneration@006
| Fields | |
|---|---|
buffer_zones[] |
Buffer zone, if provided, must be length 2. |
edit_mode |
The editing mode that describes the use case for editing. The values can be one of: * inpainting-remove * inpainting-insert * outpainting * product-image |
mask_dilation |
Parameter to control mask dilation, range: [0, 1], default: 0.03. |
guidance_scale |
Guidance scale: this controls strength of text guidance. |
product_position |
Product position: this controls the product position in the returned product editing image. The values can be one of: * reposition - the default behavior in the GPS pipeline * fixed - keeps the product in the same position as in the position as in the input image. This assume input image is square. |
mask_mode |
Automatic mask generation configuration. |
base_steps |
Number of sampling steps for base model. |
backend |
The backend to use for the model. The values can be one of: * experimental * prod |
semantic_filter_config |
Semantic Filter Config. This config reduces object hallucination on inpainted images. Users can set filter classes and filter entities to filter out generated images that hallucinate undesired objects in the inpainted area. This config is only enabled in Editing config. |
alternate_init_config |
Parameters for AlternateInitConfig |
outpainting_config |
Outpainting config. |
BufferZone
BufferZone config.
| Fields | |
|---|---|
pixels |
The number of pixels for the mask to dilate. |
diffusion_t |
When during diffusion this pixel dilation takes effect, 1=start, 0=end. |
EditMode
EditMode for imagen3capability.
| Enums | |
|---|---|
EDIT_MODE_DEFAULT |
Default editing mode. |
EDIT_MODE_INPAINT_REMOVAL |
Inpainting removal mode. Remove objects based on the mask given |
EDIT_MODE_INPAINT_INSERTION |
Inpainting insertion mode. Insert objects based on the mask given |
EDIT_MODE_OUTPAINT |
Outpainting mode. Expand the image based on the mask given |
EDIT_MODE_CONTROLLED_EDITING |
Controlled editing mode. Pass a sketch or face mesh image to control the editing. |
EDIT_MODE_STYLE |
Style editing mode. Pass a style image to define a generation style for the prompt |
EDIT_MODE_BGSWAP |
Background swap mode. Pass a background image to swap the background of the image. |
EDIT_MODE_PRODUCT_IMAGE |
Product image mode. |
ExpansionConfig
ExpansionConfig to fix one-side expansion issue by adding padding to the image and mask in the backend server and cropped them out in the post-processing.
| Fields | |
|---|---|
top |
Number of pixels to expand the image and mask from the top Value is an integer that has a minimum of 0 and a maximum of 500. |
bottom |
Number of pixels to expand the image and mask from the bottom Value is an integer that has a minimum of 0 and a maximum of 500. |
left |
Number of pixels to expand the image and mask from the left Value is an integer that has a minimum of 0 and a maximum of 500. |
right |
Number of pixels to expand the image and mask from the right Value is an integer that has a minimum of 0 and a maximum of 500. |
GenSelfieConfig
| Fields | |
|---|---|
per_example_seeds[] |
Initialization seed per generation sample. |
identity_control |
Parameter for identity control. Valid range: [0, 1.0] Default value: 0.9 |
structure_control |
Parameter for structure control. Valid range: [0, 1.0] Default value: 1.0 |
experimental_base_version |
The version for the base model. |
skip_face_cropping |
Whether to skip detecting and cropping the face in the input image. Default value: false. |
sampling_steps |
Number of sampling steps. |
enable_sharpening |
Whether to enable image sharpening post-processing. |
detection_score_threshold |
The threshold for the face detection model. Images with a face detection score below this threshold will be rejected. |
face_selection_criteria |
The criteria to select the face for Gen Selfie. Accepted values: * LARGEST * MOST_CONFIDENT |
style |
The style for the generated image. Accepted values: * watercolor * hand-drawing * illustration * 3d-character |
ImageOutputOptions
| Fields | |
|---|---|
mime_type |
Currently supported: -- image/jpeg -- image/png. Defaults to image/png. |
compression_quality |
Optional compression quality if encoding in image/jpeg. Valid range is any integer [0, 100]. Defaults to 75. |
MaskMode
| Fields | |
|---|---|
mask_type |
The type of mask to generate from the provided input image. The values can be one of: * background * foreground * semantic |
classes[] |
The class IDs to generate masks of using the Semantic Segmenter model. Only numeric class IDs are supported. Not used if the mask_type value is not |
OutpaintingProcessingConfig
OutpaintingProcessingConfig for imagen-3.0-capability-001
| Fields | |
|---|---|
blending_mode |
The blending mode for outpainting. The values can be one of: * alpha-blending * pyramid-blending |
blending_factor |
The blending factor for outpainting blending. Valid range: [0, 1]. Default value: 0 |
enable_border_replicate_padding |
Parameter to enable recompute with BORDER_REPLICATE mode for outpainting image padding. |
expansion_config |
Fix to one-side expansion issue by adding padding to the image and mask in the backend server and cropped them out in the post-processing. |
OutputOptions
Configuration options for the output image.
| Fields | |
|---|---|
mime_type |
The MIME type of the output image. The following values are supported:
If not set, defaults to |
compression_quality |
Specifies the compression quality for JPEG images. Accepted values are in the range [0, 100]. If not set, defaults to |
SemanticFilterConfig
| Fields | |
|---|---|
filter_classes[] |
Specify object class text names to filter. Any detected object in the masked region bearing anyone of the class names will be checked. |
filter_entities[] |
Specify object entity ids to filter, similar to filter_classes. The Final filter list is an union of filter classes and filter entities. |
filter_classes_outpainting[] |
For outpainting case. Specify object class text names to filter. Any detected object in the masked region bearing anyone of the class names will be checked. |
filter_entities_outpainting[] |
For outpainting case. Specify object entity ids to filter, similar to filter_classes. The Final filter list is an union of filter classes and filter entities. |
filter_classes_special_init[] |
For special_init case. Specify object class text names to filter. Any detected object in the masked region bearing anyone of the class names will be checked. |
filter_entities_special_init[] |
For special_init case. Specify object entity ids to filter, similar to filter_classes. The Final filter list is an union of filter classes and filter entities. |
enable_semantic_filter |
Whether to enable semantic filtering mode, which enables the following parameters to apply semantic filter on image editing results. |
intersect_ratio_threshold |
A threshold value to decide what detected boxes should be included in semantic filter checking. |
additional_sample_count |
Additional count of samples, expect a value between 0 and 4. |
semantic_filter_mode |
A string to specify semantic filter experimental mode. This allows semantic filter to change the default behavior to filter generated images. |
detection_score_threshold |
A detection confidence score threshold to decide which detection boxes are considered as the valid detections for semantic filter checking. |
intersect_ratio_threshold_outpainting |
For outpainting case. A threshold value to decide what detected boxes should be included in semantic filter checking. |
detection_score_threshold_outpainting |
For outpainting case. A detection confidence score threshold to decide which detection boxes are considered as the valid detections for semantic filter checking. |
intersect_ratio_threshold_special_init |
For special_init case. A threshold value to decide what detected boxes should be included in semantic filter checking. |
detection_score_threshold_special_init |
For special_init case. A detection confidence score threshold to decide which detection boxes are considered as the valid detections for semantic filter checking. |
TextEmbeddingPredictionParams
Prediction model parameters for Text Embedding.
| Fields | |
|---|---|
auto_truncate |
Whether to silently truncate inputs longer than the max sequence length. This behavior is enabled by default. If this option is set to false, oversized inputs will lead to an INVALID_ARGUMENT error, similar to other text APIs. |
output_dimensionality |
An optional argument for the output embedding's dimensionality. This parameter is only supported by some models, and the supported value range is specific to the requested model. If this parameter is specified for a model that does not support it, or if the specified value is not supported by the model, the request will fail with an INVALID_ARGUMENT error. |
UpscaleConfig
| Fields | |
|---|---|
enhance_input_image |
Whether to add an image enhancing step before upscaling. It is expected to suppress the noise and JPEG compression artifact from the input image. Default value: false. |
enable_faster_upscaling |
NOTE: For experimental use, not production-ready. Whether to speed up upscaling. This option can't be used with high QPS since it lowers the availability of the upscaling API. |
upscale_factor |
The factor to which the image will be upscaled. If not specified, the upscale factor will be determined from the longer side of the input image and |
image_preservation_factor |
With a higher image preservation factor, the original image pixels are more respected. The output image is more similar to input image. With a lower image preservation factor, the output image will have be more different from the input image, but maybe with finer details and fewer noises. Only works with: * imagegeneration@003 Valid range: [0, 1.0] Default value: 0.5 |
VideoGenerationModelParams
| Fields | |
|---|---|
sample_count |
The number of videos to generate. If not specified, 1 video is generated. |
storage_uri |
The Google Cloud Storage URI for saving the generated videos. The URI must start with |
fps |
The frame rate of the generated videos in frames per second (fps). This value can affect the smoothness of motion in the video. If not specified, a default value appropriate for the model is used. |
duration_seconds |
The target duration of the generated videos in seconds. The actual duration of the generated videos may vary slightly. If not specified, a default value appropriate for the model is used. |
seed |
Seed for random number generation. Providing the same seed with the same input parameters will produce consistent video generation results. If not specified, a random seed is used, resulting in different videos each time. If |
aspect_ratio |
The aspect ratio of the generated videos. Supported values: * |
resolution |
The resolution of the generated videos. Supported values: * |
person_generation |
Controls whether videos of people can be generated, based on age appearance. Supported values: * |
pubsub_topic |
The Cloud Pub/Sub topic to publish video generation progress to. If this field is specified, messages are published to the topic detailing the progress of video generation. The topic must be in the format |
negative_prompt |
Things that shouldn't appear in the generated videos. For example: "low quality", "ugly", "deformed". |
enable_prompt_rewriting |
Deprecated: This field is deprecated and has no effect. Use |
enhance_prompt |
Whether to automatically enhance the prompt before generating videos. If true, the prompt is improved to generate higher quality videos. If prompt enhancement is enabled, providing a |
generate_audio |
Whether to generate audio along with the video. If true, an audio track is generated for the videos. Defaults to true. |
compression_quality |
The compression quality of the generated videos. A lower quality might result in a smaller file size, while a higher quality might result in a better-looking video. Supported values: * |
VirtualTryOnModelParams
Represents the parameters for a Virtual Try-On prediction request.
| Fields | |
|---|---|
output_options |
Options for configuring the output image format. |
sample_count |
The number of images to generate. Accepted values are in the range [1,4]. If not set, defaults to |
storage_uri |
The Google Cloud Storage location where the generated images are stored. |
seed |
The random seed for image generation. This avoids randomness in generating the output images. If a |
base_steps |
The number of diffusion steps to run. The higher the number of steps, the higher the quality of the generated image, but the greater the latency. If not set, defaults to |
safety_setting |
Safety filter level for generated images. The filter blocks images that contain objectionable content. The following values are supported:
If not set, defaults to |
person_generation |
Controls whether or not faces or people are included in generated images. The following values are supported:
If not set, defaults to |
add_watermark |
Whether to add a watermark to the generated images. If not set, defaults to |
enhance_prompt |
Whether to enhance the user-provided prompt internally for models that support it. If not set, defaults to |
VisionEmbeddingModelParams
This type has no fields.
Parameter format for large vision model embedding api.
VisionGenerativeModelParams
Next ID: 34
| Fields | |
|---|---|
sample_count |
Number of output images. |
sample_image_size |
The size of output images. If empty, will use default size 1024 for imagen 2 and 3 models, 1K for Imagen 4 models. Supported size: 64, 256, 512, 1024, 2048, and 4096 for imagen 2 and 3 models. 1K, 2K (case-insensitive) for Imagen 4 models. |
storage_uri |
The gcs bucket where to save the generated images. |
negative_prompt |
Optional field in addition to the text content. Negative prompts can be explicitly stated here to help generate the images. |
seed |
The RNG seed. If RNG seed is exactly same for each request with unchanged inputs, the prediction results will be consistent. Otherwise, a random RNG seed will be used each time to produce a different result. |
mode |
The parameter to specify editing mode. Currently support: -- interactive -- upscale |
model |
Select underlying model to do the generation. Only listed models are supported: -- muse -- imagen |
aspect_ratio |
Optional generation mode parameter that controls aspect ratio. Supported ratios include: -- 1:1 (default, square) -- 5:4 (frame and print) -- 3:2 (print photography) -- 7:4 (TV screens and smartphone screens) -- 4:3 (TV) -- 16:9 (landscape) -- 9:16 (portrait) |
guidance_scale |
Optional editing mode parameter that controls the strength of the prompt. Suggested values are: -- 0-9 (low strength) -- 10-20 (medium strength) -- 21+ (high strength) |
enable_person_face_filter |
Whether to enable person/face rai filtering. Default to be false. |
disable_person_face |
|
safety_setting |
Different safety setting applying various restricness in generating images. Case insensitive. Levels are: block_low_and_above block_medium_and_above block_only_high block_none Deprecated values respectively are: block_most block_some block_few block_fewest |
rai_level |
|
enable_child_filter |
Whether to enable child rai filtering. Default to be true. This requires users are allowlisted. Otherwise, this value will be ignored. |
disable_child |
|
person_generation |
Whether allow to generate person images, and restrict to specific ages. Supported values are: dont_allow (Deprecated. Use allow_none instead.) allow_none allow_adult allow_all |
sample_image_style |
Optional. The pre-defined style for generated images. No styles will be applied if this field is empty of unspecified. Possible values could be: - photograph - digital_art - landscape - sketch - watercolor - cyberpunk - pop_art |
include_rai_reason |
Whether to include the reason why generated images are filtered |
is_product_image |
Whether use self background editing for product images. |
control_net_config |
Configurations for ControlNet conditions. |
image_output_options |
Output configuration. |
output_options |
|
upscale_config |
Configurations for upscaling API. |
edit_config |
Configurations for editing API (imagegeneration@{003, 004}) |
edit_config_v6 |
Configurations for editing API for imagegeneration@006 |
edit_mode |
Configurations for edit mode in imagen 3 capability. |
language |
Language which the prompt language is in The supported values are: - auto (Autodetect language) - en (English) - ko (Korean) - ja (Japanese) - hi (Hindi) |
include_safety_attributes |
Whether to include the safety attributes scores for both input and output. |
model_variant |
The size variant of the model. Only supported in imagegeneration@004 for now. enum: - large - medium - v1_large - v1_1 - v1_1_turbo |
add_watermark |
Whether to add SynthID watermark to generated images. Default value: false. |
gen_selfie_config |
Configurations for GenSelfie API. |
show_rai_error_codes |
Show rai error codes instead of messgaes |
enhance_prompt |
Whether to use the new prompt rewriting logic. |
VisionReasoningModelParams
Parameter format for large vision model.
| Fields | |
|---|---|
sample_count |
Number of output text responses. |
storage_uri |
The gcs bucket where to save the generated text responses. |
seed |
The RNG seed. If RNG seed is exactly same for each request with unchanged inputs, the prediction results will be consistent. Otherwise, a random RNG seed will be used each time to produce a different result. |
language |
Specific output text language. Support lanagues are: - en (default) - de - fr - it - es |