Schema Cache
Schemas are stored in an in-memory cache based on their versions. If a version is specified in the request metadata, it will be searched for in the in-memory cache. If not found, it will query the database for the version and store it in the cache. If no version information is given in the metadata, versions will be assumed to be alphanumeric and sorted in that order, and Permify will request the head version and check if it exists in the memory cache. The size of this can be determined through the Permify configuration. Here is an example configuration: service:Data Cache
Permify applies the MVCC (Multi Version Concurrency Control) pattern for Postgres, creating a separate database snapshot for each write and delete operation. This both enhances performance and provides a consistent cache. An example of a cache key is:check_{tenant_id}_{schema_version}:{snapshot_token}:{check_request}
Permify hashes each request and searches for the same key. If it cannot find it, it runs the check engine and writes to the cache, thus creating a consistently working hash.
The size of this can also be determined via the Permify configuration. Here’s an example:
service:
Cache Sizing & Eviction (Snap Tokens)
There is no separate, dedicated cache for snap tokens. The snap token is simply part of the permission cache key:permission.cache settings shown above:
| Config key | Purpose |
|---|---|
service.permission.cache.max_cost | Maximum memory budget for the permission cache (e.g. 10MiB, 256MiB). This is the effective size limit that gates how many snap-token-keyed entries can reside in memory at once. |
service.permission.cache.number_of_counters | Number of TinyLFU admission counters. A good rule of thumb is ~10× the expected number of unique cached items. |
max_cost, using Ristretto’s TinyLFU admission policy combined with a SampledLFU eviction policy. Entries are evicted when new items need space and the budget is exhausted — not after a fixed time window.
If you observe high cache miss rates after a schema version change, it is expected: the
schema_version component of the key changes, making all prior entries stale. Size your max_cost to hold a comfortable working set for the most recent schema version in use.Distributed Cache
Permify does provide a distributed cache across availability zones (within an AWS region) via Consistent Hashing. Permify uses Consistent Hashing across its distributed instances for more efficient use of their individual caches. This would allow for high availability and resilience in the face of individual nodes or even entire availability zone failure, as well as improved performance due to data locality benefits. Consistent Hashing is a distributed hashing scheme that operates independently of the number of objects in a distributed hash table. This method hashes according to the nodes’ peers, estimating which node a key would be on and thereby ensuring the most suitable request goes to the most suitable node, effectively creating a natural load balancer.How Consistent Hashing Operates in Permify
With a single instance, when an API request is made, request and corresponding response stored in its corresponding local cache. If we have more than one Permify instance consistent hashing activates on API calls, hashes the request, and outputs a unique key representing the node/instance that will store the request’s data. Suppose it stored in the instance 2, subsequent API calls with the same hash will retrieve the response from the instance 2, regardless of which instance that API called from. Using this consistent hashing approach, we can effectively utilize individual cache capacities. Adding more instances automatically increases the total cache capacity in Permify. You can learn more about consistent hashing from the following blog post: Introducing Consistent HashingNote that while the consistent hashing approach will distribute keys evenly
across the cache nodes, it’s up to the application logic to ensure the cache
is used effectively (i.e., that it reads from and writes to the cache
appropriately).
Scaling Events — Adding or Removing Pods
When you scale out (add pods) or scale in (remove pods) in Kubernetes, here is what happens at the cache level: Key rebalancing is partial, not global. The consistent hash ring updates and only the key ranges that mapped to the affected pod(s) need to move. The rest of the ring — and its cached entries — is undisturbed. Each pod’s cache is local and in-memory. Permify uses Ristretto as a process-local cache; there is no shared cache layer. This has two practical consequences:- Scale-out (new pod joins): The new pod starts with a cold cache. For the key range now routed to it, requests will miss the cache and fall through to the database until the cache warms up. Expect a temporary increase in database load and response latency immediately after a pod is added.
- Scale-in (pod removed): All entries cached in that pod’s memory are lost. The key range is reassigned to a remaining pod, which will experience cold-cache behaviour for those keys until they warm up.
max_cost budget and request rate — under typical read-heavy workloads this resolves within minutes.
Here is an example configuration: