REST Resource: projects.locations.evaluationItems

Resource: EvaluationItem

EvaluationItem is a single evaluation request or result. The content of an EvaluationItem is immutable - it cannot be updated once created. EvaluationItems can be deleted when no longer needed.

Fields
name string

Identifier. The resource name of the EvaluationItem. Format: projects/{project}/locations/{location}/evaluationItems/{evaluationItem}

displayName string

Required. The display name of the EvaluationItem.

metadata value (Value format)

Optional. metadata for the EvaluationItem.

labels map (key: string, value: string)

Optional. Labels for the EvaluationItem.

evaluationItemType enum (EvaluationItemType)

Required. The type of the EvaluationItem.

createTime string (Timestamp format)

Output only. timestamp when this item was created.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

error object (Status)

Output only. Error for the evaluation item.

payload Union type
The request or response for the EvaluationItem. payload can be only one of the following:
evaluationRequest object (EvaluationRequest)

The request to evaluate.

evaluationResponse object (EvaluationResult)

Output only. The response from evaluation.

gcsUri string

The Cloud Storage object where the request or response is stored.

JSON representation
{
  "name": string,
  "displayName": string,
  "metadata": value,
  "labels": {
    string: string,
    ...
  },
  "evaluationItemType": enum (EvaluationItemType),
  "createTime": string,
  "error": {
    object (Status)
  },

  // payload
  "evaluationRequest": {
    object (EvaluationRequest)
  },
  "evaluationResponse": {
    object (EvaluationResult)
  },
  "gcsUri": string
  // Union type
}

EvaluationRequest

A single evaluation request supporting input for both single-turn model generation and multi-turn agent execution traces.

Valid input modes: 1. Inference Mode: prompt is set (containing text or AgentData context). 2. Offline Eval Mode: prompt is unset, and candidateResponses contains agentData (the completed execution trace).

Validation Rule: Either prompt must be set, OR at least one of the candidateResponses must contain agentData.

Fields
prompt object (EvaluationPrompt)

Optional. The request/prompt to evaluate.

goldenResponse object (CandidateResponse)

Optional. The Ideal response or ground truth.

rubrics map (key: string, value: object (RubricGroup))

Optional. Named groups of rubrics associated with this prompt. The key is a user-defined name for the rubric group.

candidateResponses[] object (CandidateResponse)

Optional. Responses from model under test and other baseline models for comparison.

JSON representation
{
  "prompt": {
    object (EvaluationPrompt)
  },
  "goldenResponse": {
    object (CandidateResponse)
  },
  "rubrics": {
    string: {
      object (RubricGroup)
    },
    ...
  },
  "candidateResponses": [
    {
      object (CandidateResponse)
    }
  ]
}

EvaluationPrompt

Prompt to be evaluated. This can represent a single-turn prompt or a multi-turn conversation for agent evaluations.

Fields
data Union type
Prompt can be in one of the following formats. data can be only one of the following:
text string

Text prompt.

value value (Value format)

Fields and values that can be used to populate the prompt template.

promptTemplateData object (PromptTemplateData)

Prompt template data.

agentData object (AgentData)

Optional. Represents the complete execution trace of a multi-turn conversation, which can involve single or multiple agents. This serves as the input context for agent scraping.

JSON representation
{

  // data
  "text": string,
  "value": value,
  "promptTemplateData": {
    object (PromptTemplateData)
  },
  "agentData": {
    object (AgentData)
  }
  // Union type
}

PromptTemplateData

message to hold a prompt template and the values to populate the template.

Fields
values map (key: string, value: object (Content))

The values for fields in the prompt template.

JSON representation
{
  "values": {
    string: {
      object (Content)
    },
    ...
  }
}

AgentData

Represents data specific to multi-turn agent evaluations.

Fields
agents map (key: string, value: object (AgentConfig))

Optional. A map containing the static configurations for each agent in the system. Key: agentId (matches the author field in events). value: The static configuration of the agent.

turns[] object (ConversationTurn)

Optional. A chronological list of conversation turns. Each turn represents a logical execution cycle (e.g., user Input -> Agent Response).

JSON representation
{
  "agents": {
    string: {
      object (AgentConfig)
    },
    ...
  },
  "turns": [
    {
      object (ConversationTurn)
    }
  ]
}

AgentConfig

Represents configuration for an Agent.

Fields
agentType string

Optional. The type or class of the agent (e.g., "LlmAgent", "RouterAgent", "ToolUseAgent"). Useful for the autorater to understand the expected behavior of the agent.

description string

Optional. A high-level description of the agent's role and responsibilities. Critical for evaluating if the agent is routing tasks correctly.

instruction string

Optional. Provides instructions for the LLM model, guiding the agent's behavior. Can be static or dynamic. Dynamic instructions can contain placeholders like {variableName} that will be resolved at runtime using the AgentEvent.state_delta field.

tools[] object (Tool)

Optional. The list of tools available to this agent.

subAgents[] string

Optional. The list of valid agent IDs that this agent can delegate to. This defines the directed edges in the multi-agent system graph topology.

agentId string

Required. Unique identifier of the agent. This id is used to refer to this agent, e.g., in AgentEvent.author, or in the subAgents field. It must be unique within the agents map.

JSON representation
{
  "agentType": string,
  "description": string,
  "instruction": string,
  "tools": [
    {
      object (Tool)
    }
  ],
  "subAgents": [
    string
  ],
  "agentId": string
}

ConversationTurn

Represents a single turn/invocation in the conversation.

Fields
turnId string

Optional. A unique identifier for the turn. Useful for referencing specific turns across systems.

events[] object (AgentEvent)

Optional. The list of events that occurred during this turn.

turnIndex integer

Required. The 0-based index of the turn in the conversation sequence.

JSON representation
{
  "turnId": string,
  "events": [
    {
      object (AgentEvent)
    }
  ],
  "turnIndex": integer
}

AgentEvent

Represents a single event in the execution trace.

Fields
eventTime string (Timestamp format)

Optional. The timestamp when the event occurred.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

stateDelta object (Struct format)

Optional. The change in the session state caused by this event. This is a key-value map of fields that were modified or added by the event.

activeTools[] object (Tool)

Optional. The list of tools that were active/available to the agent at the time of this event. This overrides the AgentConfig.tools if set.

author string

Required. The id of the agent or entity that generated this event. Use "user" to denote events generated by the end-user.

content object (Content)

Required. The content of the event (e.g., text response, tool call, tool response).

JSON representation
{
  "eventTime": string,
  "stateDelta": {
    object
  },
  "activeTools": [
    {
      object (Tool)
    }
  ],
  "author": string,
  "content": {
    object (Content)
  }
}

CandidateResponse

Responses from model or agent.

Fields
candidate string

Required. The name of the candidate that produced the response.

events[] object (Content)

Optional. Intermediate events (such as tool calls and responses) that led to the final response.

data Union type
The response from the model or agent. data can be only one of the following:
text string

Text response.

value value (Value format)

Fields and values that can be used to populate the response template.

agentData object (AgentData)

Optional. Represents the complete execution trace of a multi-turn conversation, which can involve single or multiple agents. This field is used to provide the full output of an agent's run, including all turns and events, for direct evaluation.

error object (Status)

Output only. Error while scraping model or agent.

JSON representation
{
  "candidate": string,
  "events": [
    {
      object (Content)
    }
  ],

  // data
  "text": string,
  "value": value,
  "agentData": {
    object (AgentData)
  },
  "error": {
    object (Status)
  }
  // Union type
}

RubricGroup

A group of rubrics, used for grouping rubrics based on a metric or a version.

Fields
groupId string

Unique identifier for the group.

displayName string

Human-readable name for the group. This should be unique within a given context if used for display or selection. Example: "Instruction Following V1", "Content Quality - Summarization Task".

rubrics[] object (Rubric)

Rubrics that are part of this group.

JSON representation
{
  "groupId": string,
  "displayName": string,
  "rubrics": [
    {
      object (Rubric)
    }
  ]
}

EvaluationResult

Evaluation result.

Fields
evaluationRequest string

Required. The request item that was evaluated. Format: projects/{project}/locations/{location}/evaluationItems/{evaluationItem}

evaluationRun string

Required. The evaluation run that was used to generate the result. Format: projects/{project}/locations/{location}/evaluationRuns/{evaluationRun}

request object (EvaluationRequest)

Required. The request that was evaluated.

metric string

Required. The metric that was evaluated.

candidateResults[] object (CandidateResult)

Optional. The results for the metric.

metadata value (Value format)

Optional. metadata about the evaluation result.

JSON representation
{
  "evaluationRequest": string,
  "evaluationRun": string,
  "request": {
    object (EvaluationRequest)
  },
  "metric": string,
  "candidateResults": [
    {
      object (CandidateResult)
    }
  ],
  "metadata": value
}

CandidateResult

result for a single candidate.

Fields
candidate string

Required. The candidate that is being evaluated. The value is the same as the candidate name in the EvaluationRequest.

metric string

Required. The metric that was evaluated.

explanation string

Optional. The explanation for the metric.

rubricVerdicts[] object (RubricVerdict)

Optional. The rubric verdicts for the metric.

additionalResults value (Value format)

Optional. Additional results for the metric.

result Union type
The result for the metric. result can be only one of the following:
score number

Optional. The score for the metric.

JSON representation
{
  "candidate": string,
  "metric": string,
  "explanation": string,
  "rubricVerdicts": [
    {
      object (RubricVerdict)
    }
  ],
  "additionalResults": value,

  // result
  "score": number
  // Union type
}

RubricVerdict

Represents the verdict of an evaluation against a single rubric.

Fields
evaluatedRubric object (Rubric)

Required. The full rubric definition that was evaluated. Storing this ensures the verdict is self-contained and understandable, especially if the original rubric definition changes or was dynamically generated.

verdict boolean

Required. Outcome of the evaluation against the rubric, represented as a boolean. true indicates a "Pass", false indicates a "Fail".

reasoning string

Optional. Human-readable reasoning or explanation for the verdict. This can include specific examples or details from the evaluated content that justify the given verdict.

JSON representation
{
  "evaluatedRubric": {
    object (Rubric)
  },
  "verdict": boolean,
  "reasoning": string
}

EvaluationItemType

The type of the EvaluationItem.

Enums
EVALUATION_ITEM_TYPE_UNSPECIFIED The default value. This value is unused.
REQUEST The EvaluationItem is a request to evaluate.
RESULT The EvaluationItem is the result of evaluation.

Methods

create

Creates an Evaluation Item.

delete

Deletes an Evaluation Item.

get

Gets an Evaluation Item.

list

Lists Evaluation Items.