Feature · Evaluation suites

Turn prompt evaluation into a governed release-readiness workflow

SentinelAI gives teams governed evaluation suites that group prompt test cases, maintain approved baselines, compare regression posture, and feed release decisions for runtime AI systems.

What this area covers

Evaluation suites help teams move from informal prompt testing to a governed regression workflow. Suites stay linked to AI systems and prompt records so release blocking, baseline evidence, and run outcomes can be reviewed in the same operating model as the rest of AI governance.

Related product areas

  • AI systems

    Track governed runtime systems that combine models, approved use cases, datasets, release state, and readiness into one operational record.

  • Prompt registry

    Govern versioned prompts, retrieval settings, linked AI systems, and evaluation posture from a dedicated prompt operations record.

  • Release governance

    Manage AI-system release records with approval state, rollback references, dependency snapshots, and invalidation handling.

  • Governance cases

    Coordinate alerts, findings, remediation, evidence posture, SLA deadlines, and closure outcomes in one shared case workspace.

  • Compliance workflows

    Operationalize evidence collection, control tracking, remediation, and framework mapping across AI systems.

  • LLM telemetry and monitoring

    Bring live assurance signals, telemetry connector management, trigger rules, and evidence-ready monitoring context into AI governance workflows.

Core capabilities

Built to support production governance work

Suite definitions tied to AI systems

Define evaluation suites against the AI system they protect instead of treating tests as disconnected artifacts with no operational owner.

Prompt-linked test case inheritance

Clone prompt test cases into the suite so evaluation runs start from governed prompt records rather than manually recreated scenarios.

Baseline and regression controls

Track approved baselines, minimum pass-rate targets, and regression thresholds to make evaluation posture easier to compare over time.

Release-blocking configuration

Mark suites as release blocking so approvals depend on passing release-linked runs and current baseline evidence where required.

Run evidence and review context

Preserve last-run timing, run outcomes, and suite posture so governance teams can inspect evaluation evidence without rebuilding the story manually.

Target users

  • Prompt and application teams formalizing regression checks for production AI systems
  • Governance and compliance leaders who need release decisions tied to evaluation evidence
  • Platform and release owners managing readiness gates before promotion
  • Risk and assurance reviewers comparing current runs against approved baselines

Governance value

  • Turns prompt evaluation into a durable, reviewable governance workflow instead of an informal developer task
  • Keeps release gates connected to current evidence and approved baselines
  • Improves traceability between prompt changes, suite outcomes, and release approvals
  • Helps teams explain why a release was blocked, approved, or sent back for remediation
  • Brings evaluation evidence closer to the AI system and release record it affects

How teams use it

A practical operating flow for this feature family

Step 1

Define suites and inherit cases

Create suites for an AI system, pull in prompt-linked test cases, and set the thresholds that define acceptable regression posture.

Step 2

Establish a trusted baseline

Record the approved baseline run that future evaluations should be compared against before the suite becomes release-relevant.

Step 3

Use suite outcomes in release review

Carry suite evidence into release-governance decisions so approvals can reflect current evaluation performance and blocking posture.

Continue exploring

Explore how SentinelAI connects adjacent governance workflows