Resource: EvaluationItem
EvaluationItem is a single evaluation request or result. The content of an EvaluationItem is immutable - it cannot be updated once created. EvaluationItems can be deleted when no longer needed.
namestring
Identifier. The resource name of the EvaluationItem. Format: projects/{project}/locations/{location}/evaluationItems/{evaluationItem}
displayNamestring
Required. The display name of the EvaluationItem.
Optional. metadata for the EvaluationItem.
labelsmap (key: string, value: string)
Optional. Labels for the EvaluationItem.
Required. The type of the EvaluationItem.
Output only. timestamp when this item was created.
Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".
Output only. Error for the evaluation item.
payloadUnion type
payload can be only one of the following:The request to evaluate.
Output only. The response from evaluation.
gcsUristring
The Cloud Storage object where the request or response is stored.
| JSON representation |
|---|
{ "name": string, "displayName": string, "metadata": value, "labels": { string: string, ... }, "evaluationItemType": enum ( |
EvaluationRequest
A single evaluation request supporting input for both single-turn model generation and multi-turn agent execution traces.
Valid input modes: 1. Inference Mode: prompt is set (containing text or AgentData context). 2. Offline Eval Mode: prompt is unset, and candidateResponses contains agentData (the completed execution trace).
Validation Rule: Either prompt must be set, OR at least one of the candidateResponses must contain agentData.
Optional. The request/prompt to evaluate.
Optional. The Ideal response or ground truth.
Optional. Named groups of rubrics associated with this prompt. The key is a user-defined name for the rubric group.
Optional. Responses from model under test and other baseline models for comparison.
| JSON representation |
|---|
{ "prompt": { object ( |
EvaluationPrompt
Prompt to be evaluated. This can represent a single-turn prompt or a multi-turn conversation for agent evaluations.
dataUnion type
data can be only one of the following:textstring
Text prompt.
Fields and values that can be used to populate the prompt template.
Prompt template data.
Optional. Represents the complete execution trace of a multi-turn conversation, which can involve single or multiple agents. This serves as the input context for agent scraping.
| JSON representation |
|---|
{ // data "text": string, "value": value, "promptTemplateData": { object ( |
PromptTemplateData
AgentData
Represents data specific to multi-turn agent evaluations.
Optional. A map containing the static configurations for each agent in the system. Key: agentId (matches the author field in events). value: The static configuration of the agent.
Optional. A chronological list of conversation turns. Each turn represents a logical execution cycle (e.g., user Input -> Agent Response).
| JSON representation |
|---|
{ "agents": { string: { object ( |
AgentConfig
Represents configuration for an Agent.
agentTypestring
Optional. The type or class of the agent (e.g., "LlmAgent", "RouterAgent", "ToolUseAgent"). Useful for the autorater to understand the expected behavior of the agent.
descriptionstring
Optional. A high-level description of the agent's role and responsibilities. Critical for evaluating if the agent is routing tasks correctly.
instructionstring
Optional. Provides instructions for the LLM model, guiding the agent's behavior. Can be static or dynamic. Dynamic instructions can contain placeholders like {variableName} that will be resolved at runtime using the AgentEvent.state_delta field.
Optional. The list of tools available to this agent.
subAgents[]string
Optional. The list of valid agent IDs that this agent can delegate to. This defines the directed edges in the multi-agent system graph topology.
agentIdstring
Required. Unique identifier of the agent. This id is used to refer to this agent, e.g., in AgentEvent.author, or in the subAgents field. It must be unique within the agents map.
| JSON representation |
|---|
{
"agentType": string,
"description": string,
"instruction": string,
"tools": [
{
object ( |
ConversationTurn
Represents a single turn/invocation in the conversation.
turnIdstring
Optional. A unique identifier for the turn. Useful for referencing specific turns across systems.
Optional. The list of events that occurred during this turn.
turnIndexinteger
Required. The 0-based index of the turn in the conversation sequence.
| JSON representation |
|---|
{
"turnId": string,
"events": [
{
object ( |
AgentEvent
Represents a single event in the execution trace.
Optional. The timestamp when the event occurred.
Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".
Optional. The change in the session state caused by this event. This is a key-value map of fields that were modified or added by the event.
Optional. The list of tools that were active/available to the agent at the time of this event. This overrides the AgentConfig.tools if set.
Required. The content of the event (e.g., text response, tool call, tool response).
CandidateResponse
Responses from model or agent.
candidatestring
Required. The name of the candidate that produced the response.
Optional. Intermediate events (such as tool calls and responses) that led to the final response.
dataUnion type
data can be only one of the following:textstring
Text response.
Fields and values that can be used to populate the response template.
Optional. Represents the complete execution trace of a multi-turn conversation, which can involve single or multiple agents. This field is used to provide the full output of an agent's run, including all turns and events, for direct evaluation.
Output only. Error while scraping model or agent.
RubricGroup
A group of rubrics, used for grouping rubrics based on a metric or a version.
groupIdstring
Unique identifier for the group.
displayNamestring
Human-readable name for the group. This should be unique within a given context if used for display or selection. Example: "Instruction Following V1", "Content Quality - Summarization Task".
Rubrics that are part of this group.
| JSON representation |
|---|
{
"groupId": string,
"displayName": string,
"rubrics": [
{
object ( |
EvaluationResult
Evaluation result.
evaluationRequeststring
Required. The request item that was evaluated. Format: projects/{project}/locations/{location}/evaluationItems/{evaluationItem}
evaluationRunstring
Required. The evaluation run that was used to generate the result. Format: projects/{project}/locations/{location}/evaluationRuns/{evaluationRun}
Required. The request that was evaluated.
metricstring
Required. The metric that was evaluated.
Optional. The results for the metric.
Optional. metadata about the evaluation result.
| JSON representation |
|---|
{ "evaluationRequest": string, "evaluationRun": string, "request": { object ( |
CandidateResult
result for a single candidate.
candidatestring
Required. The candidate that is being evaluated. The value is the same as the candidate name in the EvaluationRequest.
metricstring
Required. The metric that was evaluated.
explanationstring
Optional. The explanation for the metric.
Optional. The rubric verdicts for the metric.
Optional. Additional results for the metric.
resultUnion type
result can be only one of the following:scorenumber
Optional. The score for the metric.
| JSON representation |
|---|
{
"candidate": string,
"metric": string,
"explanation": string,
"rubricVerdicts": [
{
object ( |
RubricVerdict
Represents the verdict of an evaluation against a single rubric.
Required. The full rubric definition that was evaluated. Storing this ensures the verdict is self-contained and understandable, especially if the original rubric definition changes or was dynamically generated.
verdictboolean
Required. Outcome of the evaluation against the rubric, represented as a boolean. true indicates a "Pass", false indicates a "Fail".
reasoningstring
Optional. Human-readable reasoning or explanation for the verdict. This can include specific examples or details from the evaluated content that justify the given verdict.
| JSON representation |
|---|
{
"evaluatedRubric": {
object ( |
EvaluationItemType
The type of the EvaluationItem.
| Enums | |
|---|---|
EVALUATION_ITEM_TYPE_UNSPECIFIED |
The default value. This value is unused. |
REQUEST |
The EvaluationItem is a request to evaluate. |
RESULT |
The EvaluationItem is the result of evaluation. |
Methods |
|
|---|---|
|
Creates an Evaluation Item. |
|
Deletes an Evaluation Item. |
|
Gets an Evaluation Item. |
|
Lists Evaluation Items. |