REST Resource: projects.locations.apps.evaluationRuns

Resource: EvaluationRun

An evaluation run represents an all the evaluation results from an evaluation execution.

JSON representation
{
  "name": string,
  "displayName": string,
  "evaluationResults": [
    string
  ],
  "createTime": string,
  "initiatedBy": string,
  "appVersion": string,
  "appVersionDisplayName": string,
  "changelog": string,
  "changelogCreateTime": string,
  "evaluations": [
    string
  ],
  "evaluationDataset": string,
  "evaluationType": enum (EvaluationRun.EvaluationType),
  "state": enum (EvaluationRun.EvaluationRunState),
  "progress": {
    object (EvaluationRun.Progress)
  },
  "config": {
    object (EvaluationConfig)
  },
  "error": {
    object (Status)
  },
  "errorInfo": {
    object (EvaluationErrorInfo)
  },
  "evaluationRunSummaries": {
    string: {
      object (EvaluationRun.EvaluationRunSummary)
    },
    ...
  },
  "latencyReport": {
    object (LatencyReport)
  },
  "runCount": integer,
  "personaRunConfigs": [
    {
      object (PersonaRunConfig)
    }
  ],
  "optimizationConfig": {
    object (OptimizationConfig)
  },
  "scheduledEvaluationRun": string,
  "goldenRunMethod": enum (GoldenRunMethod)
}
Fields
name

string

Identifier. The unique identifier of the evaluation run. Format: projects/{project}/locations/{location}/apps/{app}/evaluationRuns/{evaluationRun}

displayName

string

Optional. User-defined display name of the evaluation run. default: " run - ".

evaluationResults[]

string

Output only. The evaluation results that are part of this run. Format: projects/{project}/locations/{location}/apps/{app}/evaluations/{evaluation}/results/{result}

createTime

string (Timestamp format)

Output only. Timestamp when the evaluation run was created.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

initiatedBy

string

Output only. The user who initiated the evaluation run.

appVersion

string

Output only. The app version to evaluate. Format: projects/{project}/locations/{location}/apps/{app}/versions/{version}

appVersionDisplayName

string

Output only. The display name of the appVersion that the evaluation ran against.

changelog

string

Output only. The changelog of the app version that the evaluation ran against. This is populated if user runs evaluation on latest/draft.

changelogCreateTime

string (Timestamp format)

Output only. The create time of the changelog of the app version that the evaluation ran against. This is populated if user runs evaluation on latest/draft.

Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

evaluations[]

string

Output only. The evaluations that are part of this run. The list may contain evaluations of either type. This field is mutually exclusive with evaluationDataset. Format: projects/{project}/locations/{location}/apps/{app}/evaluations/{evaluation}

evaluationDataset

string

Output only. The evaluation dataset that this run is associated with. This field is mutually exclusive with evaluations. Format: projects/{project}/locations/{location}/apps/{app}/evaluationDatasets/{evaluationDataset}

evaluationType

enum (EvaluationRun.EvaluationType)

Output only. The type of the evaluations in this run.

state

enum (EvaluationRun.EvaluationRunState)

Output only. The state of the evaluation run.

progress

object (EvaluationRun.Progress)

Output only. The progress of the evaluation run.

config

object (EvaluationConfig)

Output only. The configuration used in the run.

error
(deprecated)

object (Status)

Output only. Deprecated: Use errorInfo instead. Errors encountered during execution.

errorInfo

object (EvaluationErrorInfo)

Output only. Error information for the evaluation run.

evaluationRunSummaries

map (key: string, value: object (EvaluationRun.EvaluationRunSummary))

Output only. Map of evaluation name to EvaluationRunSummary.

An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.

latencyReport

object (LatencyReport)

Output only. Latency report for the evaluation run.

runCount

integer

Output only. The number of times the evaluations inside the run were run.

personaRunConfigs[]

object (PersonaRunConfig)

Output only. The configuration to use for the run per persona.

optimizationConfig

object (OptimizationConfig)

Optional. Configuration for running the optimization step after the evaluation run. If not set, the optimization step will not be run.

scheduledEvaluationRun

string

Output only. The scheduled evaluation run resource name that created this evaluation run. This field is only set if the evaluation run was created by a scheduled evaluation run. Format: projects/{project}/locations/{location}/apps/{app}/scheduledEvaluationRuns/{scheduledEvaluationRun}

goldenRunMethod

enum (GoldenRunMethod)

Output only. The method used to run the evaluation.

EvaluationRun.EvaluationType

The type of the evaluations in this run. Additional values may be added in the future.

Enums
EVALUATION_TYPE_UNSPECIFIED Evaluation type is not specified.
GOLDEN Golden evaluation.
SCENARIO Scenario evaluation.
MIXED Indicates the run includes a mix of golden and scenario evaluations.

EvaluationRun.EvaluationRunState

The state of the evaluation run.

Enums
EVALUATION_RUN_STATE_UNSPECIFIED Evaluation run state is not specified.
RUNNING Evaluation run is running.
COMPLETED Evaluation run has completed.
ERROR The evaluation run has an error.

EvaluationRun.Progress

The progress of the evaluation run.

JSON representation
{
  "totalCount": integer,
  "failedCount": integer,
  "errorCount": integer,
  "completedCount": integer,
  "passedCount": integer
}
Fields
totalCount

integer

Output only. Total number of evaluation results in this run.

failedCount

integer

Output only. Number of completed evaluation results with an outcome of FAIL. (EvaluationResult.execution_state is COMPLETED and EvaluationResult.evaluation_status is FAIL).

errorCount

integer

Output only. Number of evaluation results that failed to execute. (EvaluationResult.execution_state is ERROR).

completedCount

integer

Output only. Number of evaluation results that finished successfully. (EvaluationResult.execution_state is COMPLETED).

passedCount

integer

Output only. Number of completed evaluation results with an outcome of PASS. (EvaluationResult.execution_state is COMPLETED and EvaluationResult.evaluation_status is PASS).

EvaluationRun.EvaluationRunSummary

Contains the summary of passed and failed result counts for a specific evaluation in an evaluation run.

JSON representation
{
  "passedCount": integer,
  "failedCount": integer,
  "errorCount": integer
}
Fields
passedCount

integer

Output only. Number of passed results for the associated Evaluation in this run.

failedCount

integer

Output only. Number of failed results for the associated Evaluation in this run.

errorCount

integer

Output only. Number of error results for the associated Evaluation in this run.

LatencyReport

Latency report for the evaluation run.

JSON representation
{
  "toolLatencies": [
    {
      object (LatencyReport.ToolLatency)
    }
  ],
  "callbackLatencies": [
    {
      object (LatencyReport.CallbackLatency)
    }
  ],
  "guardrailLatencies": [
    {
      object (LatencyReport.GuardrailLatency)
    }
  ],
  "llmCallLatencies": [
    {
      object (LatencyReport.LlmCallLatency)
    }
  ],
  "sessionCount": integer
}
Fields
toolLatencies[]

object (LatencyReport.ToolLatency)

Output only. Unordered list. Latency metrics for each tool.

callbackLatencies[]

object (LatencyReport.CallbackLatency)

Output only. Unordered list. Latency metrics for each callback.

guardrailLatencies[]

object (LatencyReport.GuardrailLatency)

Output only. Unordered list. Latency metrics for each guardrail.

llmCallLatencies[]

object (LatencyReport.LlmCallLatency)

Output only. Unordered list. Latency metrics for each LLM call.

sessionCount

integer

Output only. The total number of sessions considered in the latency report.

LatencyReport.ToolLatency

Latency metrics for a single tool.

JSON representation
{
  "toolDisplayName": string,
  "latencyMetrics": {
    object (LatencyReport.LatencyMetrics)
  },

  // Union field tool_identifier can be only one of the following:
  "tool": string,
  "toolsetTool": {
    object (ToolsetTool)
  }
  // End of list of possible types for union field tool_identifier.
}
Fields
toolDisplayName

string

Output only. The display name of the tool.

latencyMetrics

object (LatencyReport.LatencyMetrics)

Output only. The latency metrics for the tool.

Union field tool_identifier. The identifier of the tool. tool_identifier can be only one of the following:
tool

string

Output only. Format: projects/{project}/locations/{location}/apps/{app}/tools/{tool}.

toolsetTool

object (ToolsetTool)

Output only. The toolset tool identifier.

LatencyReport.LatencyMetrics

Latency metrics for a component.

JSON representation
{
  "p50Latency": string,
  "p90Latency": string,
  "p99Latency": string,
  "callCount": integer
}
Fields
p50Latency

string (Duration format)

Output only. The 50th percentile latency.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

p90Latency

string (Duration format)

Output only. The 90th percentile latency.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

p99Latency

string (Duration format)

Output only. The 99th percentile latency.

A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s".

callCount

integer

Output only. The number of times the resource was called.

LatencyReport.CallbackLatency

Latency metrics for a single callback.

JSON representation
{
  "stage": string,
  "latencyMetrics": {
    object (LatencyReport.LatencyMetrics)
  }
}
Fields
stage

string

Output only. The stage of the callback.

latencyMetrics

object (LatencyReport.LatencyMetrics)

Output only. The latency metrics for the callback.

LatencyReport.GuardrailLatency

Latency metrics for a single guardrail.

JSON representation
{
  "guardrail": string,
  "guardrailDisplayName": string,
  "latencyMetrics": {
    object (LatencyReport.LatencyMetrics)
  }
}
Fields
guardrail

string

Output only. The name of the guardrail. Format: projects/{project}/locations/{location}/apps/{app}/guardrails/{guardrail}.

guardrailDisplayName

string

Output only. The display name of the guardrail.

latencyMetrics

object (LatencyReport.LatencyMetrics)

Output only. The latency metrics for the guardrail.

LatencyReport.LlmCallLatency

Latency metrics for a single LLM call.

JSON representation
{
  "model": string,
  "latencyMetrics": {
    object (LatencyReport.LatencyMetrics)
  }
}
Fields
model

string

Output only. The name of the model.

latencyMetrics

object (LatencyReport.LatencyMetrics)

Output only. The latency metrics for the LLM call.

Methods

delete

Deletes an evaluation run.

get

Gets details of the specified evaluation run.

list

Lists all evaluation runs in the given app.