Enterprise
Response Caching
Exact-match caching for deterministic LLM requests. Reduce latency and cost for repeated identical queries.
Enterprise feature
Response caching is available on Enterprise plans. Enable it from the Caching section of your dashboard.
How It Works
The cache middleware intercepts POST requests in the proxy chain. If the request is deterministic (temperature = 0) and has been seen before, the cached response is returned immediately with no upstream provider call is made.
Cache Logic
- Request arrives at the proxy chain
- Middleware reads the body and checks if
temperatureis0 - Non-deterministic requests bypass the cache entirely
- A cache key is computed from: provider + path + request body hash
- If the key exists → cache HIT: return cached response
- If the key is missing → cache MISS: forward to provider, cache successful response
Cache Headers
Every proxied response includes a cache status header:
| Header | Value | Description |
|---|---|---|
X-Cache | HIT | Response served from cache |
X-Cache | MISS | Response from upstream, now cached |
TTL & Eviction
Cached responses have a default TTL of 10 minutes. After expiry, the next identical request triggers a fresh upstream call and refreshes the cache.
What Gets Cached
- Only
POSTrequests (LLM APIs use POST) - Only when
temperatureis0(deterministic output) - Only successful responses (2xx status codes)
- Cache key includes provider name, request path, and full request body
Cost savings
Caching is especially effective for classification, extraction, and structured-output workloads where the same input is frequently repeated. A cache hit costs zero tokens and returns in milliseconds instead of seconds.