Cache

[ ]

There are many caches some of which you may not expect:

  • Client caching, e.g., browser often has its own cache
  • CDN
  • Web server caching, e.g., Reverse proxies and caches such as Varnish can serve static and dynamic content directly.
  • Database caching
  • Application caching. In-memory caches such as Memcached and Redis are key-value stores between your application and your data storage.

How to cache

Caching at the database query level

Whenever you query the database, hash the query as a key and store the result to the cache. This approach suffers from expiration issues:

  • Hard to delete a cached result with complex queries
  • If one piece of data changes such as a table cell, you need to delete all cached queries that might include the changed cell

Caching at the object level

See your data as an object, similar to what you do with your application code. Have your application assemble the dataset from the database into a class instance or a data structure(s):

  • Remove the object from cache if its underlying data has changed
  • Allows for asynchronous processing: workers assemble objects by consuming the latest cached object

Suggestions of what to cache:

  • User sessions
  • Fully rendered web pages
  • Activity streams
  • User graph data

It can be done in client side or application server.

When to update the cache

Cache-aside

Cache-aside is also referred to as lazy loading. Only requested data is cached. The application does the following:

  • Look for entry in cache, resulting in a cache miss
  • Load entry from the database
  • Add entry to cache
  • Return entry

Disadvantage:

  • Each cache miss results in three trips, which can cause a noticeable delay.
  • Data can become stale if it is updated in the database. This issue is mitigated by setting a time-to-live (TTL) which forces an update of the cache entry, or by using write-through.

Write-through

The application uses the cache as the main data store, reading and writing data to it, while the cache is responsible for reading and writing to the database:

  • Application adds/updates entry in cache
  • Cache synchronously writes entry to data store Return

Disadvantage

  • When a new node is created due to failure or scaling, the new node will not cache entries until the entry is updated in the database. Cache-aside in conjunction with write through can mitigate this issue.
  • Most data written might never be read, which can be minimized with a TTL.

Write-behind (write-back)

Source: Scalability, availability, stability, patterns

In write-behind, the application does the following:

  • Add/update entry in cache
  • Asynchronously write entry to the data store, improving write performance

Disadvantage:

  • There could be data loss if the cache goes down prior to its contents hitting the data store.
  • It is more complex to implement write-behind than it is to implement cache-aside or write-through.

Refresh-ahead

You can configure the cache to automatically refresh any recently accessed cache entry prior to its expiration.

Refresh-ahead can result in reduced latency vs read-through if the cache can accurately predict which items are likely to be needed in the future.

Written on March 30, 2019