Enterprise

Response Caching

Exact-match caching for deterministic LLM requests. Reduce latency and cost for repeated identical queries.

Enterprise feature

Response caching is available on Enterprise plans. Enable it from the Caching section of your dashboard.

How It Works

The cache middleware intercepts POST requests in the proxy chain. If the request is deterministic (temperature = 0) and has been seen before, the cached response is returned immediately with no upstream provider call is made.

Cache Logic

  1. Request arrives at the proxy chain
  2. Middleware reads the body and checks if temperature is 0
  3. Non-deterministic requests bypass the cache entirely
  4. A cache key is computed from: provider + path + request body hash
  5. If the key exists → cache HIT: return cached response
  6. If the key is missing → cache MISS: forward to provider, cache successful response

Cache Headers

Every proxied response includes a cache status header:

HeaderValueDescription
X-CacheHITResponse served from cache
X-CacheMISSResponse from upstream, now cached

TTL & Eviction

Cached responses have a default TTL of 10 minutes. After expiry, the next identical request triggers a fresh upstream call and refreshes the cache.

What Gets Cached

  • Only POST requests (LLM APIs use POST)
  • Only when temperature is 0 (deterministic output)
  • Only successful responses (2xx status codes)
  • Cache key includes provider name, request path, and full request body

Cost savings

Caching is especially effective for classification, extraction, and structured-output workloads where the same input is frequently repeated. A cache hit costs zero tokens and returns in milliseconds instead of seconds.