Eval Engine
14 built-in evaluators across three tiers: static rules, dynamic analysis, and LLM-as-judge. Run evals in real-time, on a schedule, or against datasets.
Cloud-hosted
api.ingateai.com and results are available in your dashboard.Enterprise feature
Overview
The eval engine lets you define quality checks that run automatically against LLM responses. Each evaluator receives trace data and produces a score (0.0–1.0) with an explanation. Evaluators are organized into three tiers:
- Static (9): deterministic rule checks with zero external dependencies
- Dynamic (4): computed evaluations with programmatic logic
- LLM Judge (1): another LLM rates the response quality
Static Evaluators
Deterministic rule checks with zero external dependencies:
| Evaluator | Description |
|---|---|
contains | Output contains a specific substring |
not_contains | Output does not contain the specified substring |
regex_match | Output matches a regular expression |
is_json | Output is valid JSON |
max_length | Character count within maximum |
min_length | Character count meets minimum |
latency_max | Request latency within threshold (milliseconds) |
token_max | Token usage within threshold |
status_code | HTTP status code (exact or range match) |
Dynamic Evaluators
Computed evaluations with programmatic logic:
| Evaluator | Description |
|---|---|
json_schema | Validates JSON structure, checking required keys and value types |
similarity | Word overlap similarity against a reference string |
reference_match | Compare against reference (exact, contains, or similarity) |
webhook | Send trace to an external URL for custom evaluation |
LLM Judge
Uses another LLM to evaluate the response quality:
| Evaluator | Description |
|---|---|
llm_judge | Another LLM rates the response quality on a 0–10 scale |
Create an Eval
JSON Output Check
curl -X POST https://api.ingateai.com/api/v1/evals \
-H "Content-Type: application/json" \
-H "X-Ingate-Key: sk-ingate-your-key" \
-d '{
"name": "json-output-check",
"type": "static",
"evaluator": "is_json",
"config": { "field": "completion" },
"enabled": true
}'Content Safety
curl -X POST https://api.ingateai.com/api/v1/evals \
-H "Content-Type: application/json" \
-H "X-Ingate-Key: sk-ingate-your-key" \
-d '{
"name": "no-refusal",
"type": "static",
"evaluator": "not_contains",
"config": {
"field": "completion",
"value": "I cannot help with that",
"case_sensitive": false
},
"enabled": true
}'Webhook Evaluator
curl -X POST https://api.ingateai.com/api/v1/evals \
-H "Content-Type: application/json" \
-H "X-Ingate-Key: sk-ingate-your-key" \
-d '{
"name": "custom-check",
"type": "dynamic",
"evaluator": "webhook",
"config": {
"url": "https://your-api.com/eval",
"timeout_ms": 5000
},
"enabled": true
}'LLM Judge
curl -X POST https://api.ingateai.com/api/v1/evals \
-H "Content-Type: application/json" \
-H "X-Ingate-Key: sk-ingate-your-key" \
-d '{
"name": "quality-judge",
"type": "llm_judge",
"evaluator": "llm_judge",
"config": {
"prompt_template": "Rate the quality of this response 0-10.\n\nUser: {{prompt}}\n\nAssistant: {{completion}}\n\nScore (number only):",
"provider": "openai",
"model": "gpt-4o-mini"
},
"enabled": true
}'Running Evals
Evals can be triggered three ways:
- Real-time: evaluated automatically on every gateway request
- Scheduled: periodic evaluation of new traces (configurable interval)
- Manual: trigger a run via the API
# Trigger manual run
curl -X POST https://api.ingateai.com/api/v1/evals/{id}/run \
-H "X-Ingate-Key: sk-ingate-your-key"
# View results
curl https://api.ingateai.com/api/v1/evals/{id}/results \
-H "X-Ingate-Key: sk-ingate-your-key"
# List all runs
curl https://api.ingateai.com/api/v1/evals/runs \
-H "X-Ingate-Key: sk-ingate-your-key"Running Evals Against Datasets
For systematic evaluation, run evals against a dataset version instead of live traffic. Each test case in the dataset is evaluated independently and results are aggregated per eval definition.
# Run an eval against a dataset version
curl -X POST https://api.ingateai.com/api/v1/evals/{eval_id}/run \
-H "Content-Type: application/json" \
-H "X-Ingate-Key: sk-ingate-your-key" \
-d '{
"dataset_id": "ds_abc123",
"version": 3
}'Dataset-backed eval runs return per-case scores alongside aggregate statistics (mean, median, pass rate). This is especially useful for regression testing. commit a dataset version, run your evals, and compare scores across releases.
Datasets
Available Fields
Most evaluators inspect a specific field from the trace. Set the field key in your eval config to target one of:
| Field | Description |
|---|---|
prompt | The user's input message |
completion | The LLM's response text |
request_body | Full request payload (JSON) |
response_body | Full response payload (JSON) |
provider | Provider name (e.g. openai, anthropic) |
model | Model identifier used for the request |
path | Request path through the gateway |
method | HTTP method (GET, POST, etc.) |
Trace fields
field default to completion. The latency_max and token_max evaluators use metadata fields automatically and don't require a field config.