Cloud

Eval Engine

14 built-in evaluators across three tiers: static rules, dynamic analysis, and LLM-as-judge. Run evals in real-time, on a schedule, or against datasets.

Cloud-hosted

Evals run entirely on Ingate's managed infrastructure. There's nothing to install or self-host. Create evals via the API at api.ingateai.com and results are available in your dashboard.

Enterprise feature

The eval engine requires an Enterprise plan. Free plans include proxy, logging, and the dashboard but do not have access to evals.

Overview

The eval engine lets you define quality checks that run automatically against LLM responses. Each evaluator receives trace data and produces a score (0.0–1.0) with an explanation. Evaluators are organized into three tiers:

  • Static (9): deterministic rule checks with zero external dependencies
  • Dynamic (4): computed evaluations with programmatic logic
  • LLM Judge (1): another LLM rates the response quality

Static Evaluators

Deterministic rule checks with zero external dependencies:

EvaluatorDescription
containsOutput contains a specific substring
not_containsOutput does not contain the specified substring
regex_matchOutput matches a regular expression
is_jsonOutput is valid JSON
max_lengthCharacter count within maximum
min_lengthCharacter count meets minimum
latency_maxRequest latency within threshold (milliseconds)
token_maxToken usage within threshold
status_codeHTTP status code (exact or range match)

Dynamic Evaluators

Computed evaluations with programmatic logic:

EvaluatorDescription
json_schemaValidates JSON structure, checking required keys and value types
similarityWord overlap similarity against a reference string
reference_matchCompare against reference (exact, contains, or similarity)
webhookSend trace to an external URL for custom evaluation

LLM Judge

Uses another LLM to evaluate the response quality:

EvaluatorDescription
llm_judgeAnother LLM rates the response quality on a 0–10 scale

Create an Eval

JSON Output Check

bash
curl -X POST https://api.ingateai.com/api/v1/evals \
  -H "Content-Type: application/json" \
  -H "X-Ingate-Key: sk-ingate-your-key" \
  -d '{
    "name": "json-output-check",
    "type": "static",
    "evaluator": "is_json",
    "config": { "field": "completion" },
    "enabled": true
  }'

Content Safety

bash
curl -X POST https://api.ingateai.com/api/v1/evals \
  -H "Content-Type: application/json" \
  -H "X-Ingate-Key: sk-ingate-your-key" \
  -d '{
    "name": "no-refusal",
    "type": "static",
    "evaluator": "not_contains",
    "config": {
      "field": "completion",
      "value": "I cannot help with that",
      "case_sensitive": false
    },
    "enabled": true
  }'

Webhook Evaluator

bash
curl -X POST https://api.ingateai.com/api/v1/evals \
  -H "Content-Type: application/json" \
  -H "X-Ingate-Key: sk-ingate-your-key" \
  -d '{
    "name": "custom-check",
    "type": "dynamic",
    "evaluator": "webhook",
    "config": {
      "url": "https://your-api.com/eval",
      "timeout_ms": 5000
    },
    "enabled": true
  }'

LLM Judge

bash
curl -X POST https://api.ingateai.com/api/v1/evals \
  -H "Content-Type: application/json" \
  -H "X-Ingate-Key: sk-ingate-your-key" \
  -d '{
    "name": "quality-judge",
    "type": "llm_judge",
    "evaluator": "llm_judge",
    "config": {
      "prompt_template": "Rate the quality of this response 0-10.\n\nUser: {{prompt}}\n\nAssistant: {{completion}}\n\nScore (number only):",
      "provider": "openai",
      "model": "gpt-4o-mini"
    },
    "enabled": true
  }'

Running Evals

Evals can be triggered three ways:

  • Real-time: evaluated automatically on every gateway request
  • Scheduled: periodic evaluation of new traces (configurable interval)
  • Manual: trigger a run via the API
bash
# Trigger manual run
curl -X POST https://api.ingateai.com/api/v1/evals/{id}/run \
  -H "X-Ingate-Key: sk-ingate-your-key"

# View results
curl https://api.ingateai.com/api/v1/evals/{id}/results \
  -H "X-Ingate-Key: sk-ingate-your-key"

# List all runs
curl https://api.ingateai.com/api/v1/evals/runs \
  -H "X-Ingate-Key: sk-ingate-your-key"

Running Evals Against Datasets

For systematic evaluation, run evals against a dataset version instead of live traffic. Each test case in the dataset is evaluated independently and results are aggregated per eval definition.

bash
# Run an eval against a dataset version
curl -X POST https://api.ingateai.com/api/v1/evals/{eval_id}/run \
  -H "Content-Type: application/json" \
  -H "X-Ingate-Key: sk-ingate-your-key" \
  -d '{
    "dataset_id": "ds_abc123",
    "version": 3
  }'

Dataset-backed eval runs return per-case scores alongside aggregate statistics (mean, median, pass rate). This is especially useful for regression testing. commit a dataset version, run your evals, and compare scores across releases.

Datasets

See the Datasets docs for how to create datasets, import test cases from CSV/JSONL, and commit immutable versions.

Available Fields

Most evaluators inspect a specific field from the trace. Set the field key in your eval config to target one of:

FieldDescription
promptThe user's input message
completionThe LLM's response text
request_bodyFull request payload (JSON)
response_bodyFull response payload (JSON)
providerProvider name (e.g. openai, anthropic)
modelModel identifier used for the request
pathRequest path through the gateway
methodHTTP method (GET, POST, etc.)

Trace fields

Evaluators that don't specify a field default to completion. The latency_max and token_max evaluators use metadata fields automatically and don't require a field config.