- Resource: EvaluationRun
- EvaluationRun.EvaluationType
- EvaluationRun.EvaluationRunState
- EvaluationRun.Progress
- EvaluationRun.EvaluationRunSummary
- LatencyReport
- LatencyReport.ToolLatency
- LatencyReport.LatencyMetrics
- LatencyReport.CallbackLatency
- LatencyReport.GuardrailLatency
- LatencyReport.LlmCallLatency
- Methods
Resource: EvaluationRun
An evaluation run represents an all the evaluation results from an evaluation execution.
| JSON representation |
|---|
{ "name": string, "displayName": string, "evaluationResults": [ string ], "createTime": string, "initiatedBy": string, "appVersion": string, "appVersionDisplayName": string, "changelog": string, "changelogCreateTime": string, "evaluations": [ string ], "evaluationDataset": string, "evaluationType": enum ( |
| Fields | |
|---|---|
name |
Identifier. The unique identifier of the evaluation run. Format: |
displayName |
Optional. User-defined display name of the evaluation run. default: " |
evaluationResults[] |
Output only. The evaluation results that are part of this run. Format: |
createTime |
Output only. Timestamp when the evaluation run was created. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
initiatedBy |
Output only. The user who initiated the evaluation run. |
appVersion |
Output only. The app version to evaluate. Format: |
appVersionDisplayName |
Output only. The display name of the |
changelog |
Output only. The changelog of the app version that the evaluation ran against. This is populated if user runs evaluation on latest/draft. |
changelogCreateTime |
Output only. The create time of the changelog of the app version that the evaluation ran against. This is populated if user runs evaluation on latest/draft. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: |
evaluations[] |
Output only. The evaluations that are part of this run. The list may contain evaluations of either type. This field is mutually exclusive with |
evaluationDataset |
Output only. The evaluation dataset that this run is associated with. This field is mutually exclusive with |
evaluationType |
Output only. The type of the evaluations in this run. |
state |
Output only. The state of the evaluation run. |
progress |
Output only. The progress of the evaluation run. |
config |
Output only. The configuration used in the run. |
error |
Output only. Deprecated: Use errorInfo instead. Errors encountered during execution. |
errorInfo |
Output only. Error information for the evaluation run. |
evaluationRunSummaries |
Output only. Map of evaluation name to EvaluationRunSummary. An object containing a list of |
latencyReport |
Output only. Latency report for the evaluation run. |
runCount |
Output only. The number of times the evaluations inside the run were run. |
personaRunConfigs[] |
Output only. The configuration to use for the run per persona. |
optimizationConfig |
Optional. Configuration for running the optimization step after the evaluation run. If not set, the optimization step will not be run. |
scheduledEvaluationRun |
Output only. The scheduled evaluation run resource name that created this evaluation run. This field is only set if the evaluation run was created by a scheduled evaluation run. Format: |
goldenRunMethod |
Output only. The method used to run the evaluation. |
EvaluationRun.EvaluationType
The type of the evaluations in this run. Additional values may be added in the future.
| Enums | |
|---|---|
EVALUATION_TYPE_UNSPECIFIED |
Evaluation type is not specified. |
GOLDEN |
Golden evaluation. |
SCENARIO |
Scenario evaluation. |
MIXED |
Indicates the run includes a mix of golden and scenario evaluations. |
EvaluationRun.EvaluationRunState
The state of the evaluation run.
| Enums | |
|---|---|
EVALUATION_RUN_STATE_UNSPECIFIED |
Evaluation run state is not specified. |
RUNNING |
Evaluation run is running. |
COMPLETED |
Evaluation run has completed. |
ERROR |
The evaluation run has an error. |
EvaluationRun.Progress
The progress of the evaluation run.
| JSON representation |
|---|
{ "totalCount": integer, "failedCount": integer, "errorCount": integer, "completedCount": integer, "passedCount": integer } |
| Fields | |
|---|---|
totalCount |
Output only. Total number of evaluation results in this run. |
failedCount |
Output only. Number of completed evaluation results with an outcome of FAIL. (EvaluationResult.execution_state is COMPLETED and EvaluationResult.evaluation_status is FAIL). |
errorCount |
Output only. Number of evaluation results that failed to execute. (EvaluationResult.execution_state is ERROR). |
completedCount |
Output only. Number of evaluation results that finished successfully. (EvaluationResult.execution_state is COMPLETED). |
passedCount |
Output only. Number of completed evaluation results with an outcome of PASS. (EvaluationResult.execution_state is COMPLETED and EvaluationResult.evaluation_status is PASS). |
EvaluationRun.EvaluationRunSummary
Contains the summary of passed and failed result counts for a specific evaluation in an evaluation run.
| JSON representation |
|---|
{ "passedCount": integer, "failedCount": integer, "errorCount": integer } |
| Fields | |
|---|---|
passedCount |
Output only. Number of passed results for the associated Evaluation in this run. |
failedCount |
Output only. Number of failed results for the associated Evaluation in this run. |
errorCount |
Output only. Number of error results for the associated Evaluation in this run. |
LatencyReport
Latency report for the evaluation run.
| JSON representation |
|---|
{ "toolLatencies": [ { object ( |
| Fields | |
|---|---|
toolLatencies[] |
Output only. Unordered list. Latency metrics for each tool. |
callbackLatencies[] |
Output only. Unordered list. Latency metrics for each callback. |
guardrailLatencies[] |
Output only. Unordered list. Latency metrics for each guardrail. |
llmCallLatencies[] |
Output only. Unordered list. Latency metrics for each LLM call. |
sessionCount |
Output only. The total number of sessions considered in the latency report. |
LatencyReport.ToolLatency
Latency metrics for a single tool.
| JSON representation |
|---|
{ "toolDisplayName": string, "latencyMetrics": { object ( |
| Fields | |
|---|---|
toolDisplayName |
Output only. The display name of the tool. |
latencyMetrics |
Output only. The latency metrics for the tool. |
Union field tool_identifier. The identifier of the tool. tool_identifier can be only one of the following: |
|
tool |
Output only. Format: |
toolsetTool |
Output only. The toolset tool identifier. |
LatencyReport.LatencyMetrics
Latency metrics for a component.
| JSON representation |
|---|
{ "p50Latency": string, "p90Latency": string, "p99Latency": string, "callCount": integer } |
| Fields | |
|---|---|
p50Latency |
Output only. The 50th percentile latency. A duration in seconds with up to nine fractional digits, ending with ' |
p90Latency |
Output only. The 90th percentile latency. A duration in seconds with up to nine fractional digits, ending with ' |
p99Latency |
Output only. The 99th percentile latency. A duration in seconds with up to nine fractional digits, ending with ' |
callCount |
Output only. The number of times the resource was called. |
LatencyReport.CallbackLatency
Latency metrics for a single callback.
| JSON representation |
|---|
{
"stage": string,
"latencyMetrics": {
object ( |
| Fields | |
|---|---|
stage |
Output only. The stage of the callback. |
latencyMetrics |
Output only. The latency metrics for the callback. |
LatencyReport.GuardrailLatency
Latency metrics for a single guardrail.
| JSON representation |
|---|
{
"guardrail": string,
"guardrailDisplayName": string,
"latencyMetrics": {
object ( |
| Fields | |
|---|---|
guardrail |
Output only. The name of the guardrail. Format: |
guardrailDisplayName |
Output only. The display name of the guardrail. |
latencyMetrics |
Output only. The latency metrics for the guardrail. |
LatencyReport.LlmCallLatency
Latency metrics for a single LLM call.
| JSON representation |
|---|
{
"model": string,
"latencyMetrics": {
object ( |
| Fields | |
|---|---|
model |
Output only. The name of the model. |
latencyMetrics |
Output only. The latency metrics for the LLM call. |
Methods |
|
|---|---|
|
Deletes an evaluation run. |
|
Gets details of the specified evaluation run. |
|
Lists all evaluation runs in the given app. |