Proxy Caching
Proxy caching puts a managed response cache in front of your HTTP container. Repeat requests for the same URL are served from the cache instead of reaching your container, so origin load drops, tail latency improves, and bursty read traffic stops eating your compute budget.
When you enable it, Bahriya stands up a three-node managed cache cluster in every region the container runs in. The platform's edge checks the cache before forwarding a request. Hits come back in milliseconds and never touch your container. Misses go through, the response is stored, and the next caller benefits.
When to use it
Proxy caching is a strong fit for read-heavy public APIs, catalogue and listing endpoints, anything whose upstream work is expensive (joins, third-party calls, LLM inference), and as a cheap anti-scrape layer.
When NOT to use it
- Highly dynamic responses that change on almost every request (live prices, leaderboards, real-time inventory).
- Per-user content — if every response depends on the caller, the cache key fans out to one entry per user and the hit rate collapses. Reach for application-level memcached instead.
- Write endpoints —
POST, PUT, PATCH, DELETE are never cached.
- Single-digit-second freshness requirements — even a short TTL means some users will see stale data for the length of the TTL.
Sizing the cache
Pick total cache size from 256 MB to 8 GB per region, in 256 MB steps.
A rough rule:
working_set = avg_response_size_kb * unique_cacheable_urls
target_size = working_set * 1.5 # headroom for metadata + churn
Worked examples:
- 5 KB JSON, 50,000 unique URLs → 250 MB working set → start at 512 MB.
- 20 KB JSON, 200,000 URLs → 4 GB → start at 6 GB.
- 80 KB thumbnails, 30,000 URLs → 2.4 GB → start at 4 GB.
If you can't estimate it, start at 1 GB, watch the hit rate after a day of traffic, and resize.
Max item size
Defaults to 1 MB, can go up to 128 MB. Increase it if you cache large JSON, big SVGs, or rendered HTML. Anything bigger than the limit is passed through uncached. Pick the smallest value that fits what you actually want to cache — larger limits raise memory ceilings and increase fragmentation.
TTLs
- Cache TTL (default 300 s) — how long a response is considered fresh.
- Storage TTL (optional, must be ≥ cache TTL) — how long the response is kept around after going stale. With storage TTL set, a stale entry can serve as a fallback if your container is briefly unreachable, without paying for a cold re-fetch.
For most APIs, 60–600 s is the right range. Pages that change a few times a day sit at 3600+ comfortably.
If Honour Cache-Control is on (the default), your container can override the platform TTL per-response with standard Cache-Control headers. Turn it off if you want platform settings to win unconditionally.
The optional X-Proxy-Cache-Memcached-Force: true header lets a trusted client bypass freshness and force a stored response. Off by default; only enable it on containers where trusted clients call you.
Cache key, methods, status codes, content types
- Methods default to
GET, HEAD. You can add OPTIONS. Write methods are rejected.
- Status codes default to
200, 301, 404. Common additions: 410, 204.
- Content types default to
text/plain, application/json. Add text/html or image/* as needed.
- Cache key defaults to method + path + query. You can extend it with specific headers (
Accept-Language), an explicit query-param allow-list, or JSON body fields for POST-style search endpoints. Keep the key narrow — every dimension multiplies your entry count.
Variant: before or after rate limit
You pick where the cache sits relative to the container's rate limiter.
- Before rate limit — hits don't consume rate-limit tokens. Use this for public read endpoints: a scraper hitting the same URL 10,000 times in a minute gets 10,000 cached responses and your real users keep their full token budget.
- After rate limit — every request, hit or miss, counts. Use this when the rate limit exists to enforce per-user fairness on an authenticated API, not to protect the origin.
High availability
The cache cluster runs three nodes per region. If one fails, the cluster keeps serving; the share of keys that lived on the lost node will miss until rebalancing re-populates that slice from real traffic. Expect a temporary dip in hit rate, not an outage. If the whole cache is unavailable, the platform falls back to your container origin transparently — cache loss never causes request failures.
The managed cache appears in your memcached list as read-only ("Managed by container {name}"). You manage it through the container's settings, not directly.
Pricing
| Component | Standard | Premium |
| Proxy cache surcharge | $5.00 / region / month | $7.50 / region / month |
| Managed memcached (cache memory) | $10 / GB / month | $15 / GB / month |
The surcharge is flat per region. The memcached charge scales with the cache size you pick.
Worked example
1 GB standard cache in 2 regions:
- Managed memcached: 1 GB × 2 × $10 = $20
- Proxy cache surcharge: 2 × $5 = $10
- Total: $30 / month
4 GB premium cache in 4 regions:
- Managed memcached: 4 GB × 4 × $15 = $240
- Proxy cache surcharge: 4 × $7.50 = $30
- Total: $270 / month
Common configurations
Public read API, anti-scrape focus — 1 GB, cache TTL 60 s, methods GET, HEAD, status 200, 404, vary on Accept-Language, variant before rate limit.
Authenticated tenant API, per-user fairness — 2 GB, cache TTL 300 s, vary on Authorization (or a tenant header), variant after rate limit.
Catalogue / product listing — 4 GB, cache TTL 1800 s, storage TTL 7200 s, honour Cache-Control on, content types application/json, text/html, variant before rate limit.
All of these can be changed without rebuilding your container image — the cache layer picks up the new configuration on the next deployment cycle.