Multiple cache backends in Django


Out of the box, Django’s cache framework includes different cache back-ends (including the venerable memcached) and granularities (site-wide, view-specific, etc.). How could you improve on this awesomeness?

One way is to use multiple back-ends. This might be desirable if your application needs a vanilla-flavored memcache for the site, and a second cache for a data import. Or maybe you want function results cached with different validation criteria and/or lifespans.

Using multiple cache back-ends in a Django project is easy.

Resources

I chose Beaker for my project’s multiple back-end framework. Its documentation is very good, though sometimes terse. Ben Bangert wrote a great post about Beaker and Django, and you should read it when you’re done here.

Step 0: Measure twice, cut once

Before you do anything, ask yourself what problem you’re trying to solve with multiple caches, and whether there’s another way to solve them. Adding any technology to your stack means more work. The application/technology you add is another thing…

  • …to install and configure
  • …that will need periodic updates
  • …that can will have bugs
  • …that could have unexpected interactions with your other technology

Less is less, and more is more. Make sure you really need to add another gizmo to your technology stack.

Your next step is to characterize your different cache regions. This usually comes down to differences in size, granularity, and expiration policy.

My system needs a traditional function-result cache, and a cache for gigabytes of data from another server. The function cache should have a typical web site object lifetime, but the data-import cache wants a much longer object lifetime. The granularity will also differ: Functions invoked & keyed by request.user, vs. text and image files. Yeah, I could jam them into one cache, but there would be compromises. I don’t want to compromise.

Philosophy

The purist approach is: Include no caching directives whatsoever in application source code. Cache directives appear only in settings.py, so applications will have no awareness of whether or how the project uses caching.

I’ve never been fond of Django’s per-site caching. You’ve got to use CACHE_MIDDLEWARE_ANONYMOUS_ONLY unless your site is trivial, and the cache middleware only caches pages without GET or POST parameters. (It is cool that you can do some simple site caching out of the box, but it is limited.)

I embrace the dirty less-pure approach, which is that cache directives in application code are OK. My system’s applications won’t be shared with other projects, so I’m not concerned with portability. And I can now control which functions are cached, and use multiple cache back-ends in different parts of the code, which is after all what I want to do in the first place. 🙂

Although the application code is modified, Django doesn’t know you’re using a cache! None of Django’s cache directives get used. (Although you could do it that way, it’s not how I did it. I’ll explain why in a bit.)

Step 1: Install Beaker

$ pip install beaker

Step 2: Install your cache backend(s)

Your cache(s) will work better if you install them. Duh.

Step 3: Delete Django’s caching HOOKS

Delete all references containing the word “cache” from settings.py.

If you used SESSION_ENGINE, you need to delete that too. To keep your world more consistent, you could use Beaker’s session system.

Step 4: Add Beaker’s cache definitions

In my project, I wanted memcached for site caching, and a file-based cache for a large data import. Beaker has a simple syntax for defining cache regions, and inheriting global options for anything not defined in a region.

Here’s my DJANGOPROJECT/beakercache.py:

""" Cache definitions.

This defines a memcache-based region for code functions,
and a file-based region for import data.

"""
import memcache  # Used by modules that import this.
from beaker.cache import CacheManager
from beaker.util import parse_cache_config_options

cache_opts = {
    # Globals
    'cache.regions': 'functions, files',
    'cache.lock_dir': '/tmp/beaker/lock',
    'cache.data_dir': '/tmp/beaker/data',
    # function region
    'cache.functions.type': 'ext:memcached',
    'cache.functions.url': '127.0.0.1:11211',
    'cache.functions.expire': 60 * 60,  # 60-minute timeout
    # file region
    'cache.files.type': 'file',
    'cache.files.expire': 5 * 24 * 60 * 60,  # 5 day timeout
    }

cache = CacheManager(**parse_cache_config_options(cache_opts))

Step 5: Hook up your function cache

Just use a @cache.region decorator on a function. The cache entry key includes positional function parameters, which means you can cache logged-in users’ function results if the function takes request.user as a parameter.

If the function has one or more other decorators that decide whether to allow execution, put the @cache.region before them.

You can’t cache a Django view, because Beaker throws an exception if a cache key is more than 250 bytes long. Which it will be for a view function, since the “request” parameter will be used to make the key. (You can use Beaker to cache your views, with Django’s @cache_page decorator. But doing so means you’ve defined Beaker as a custom cache back-end in Django, via the CACHE_BACKEND setting. And that means you’ve written a custom cache backend module for Beaker. And that is a bigger fur ball than I want to cough up now. If you do that, write about it on your own blog. 🙂 )

from beakercache import cache

@cache.region("functions")
def expiration(lease_date, policy_date):
    ithink = you.get_the idea(lease_date, policy_date)
    return ithink

Step 6: Hook up your other cache region(s)

If your other cache region(s) also covers functions, then you know what to do. Just use the @cache.region decorator with a different argument.

If your other cache region(s) covers objects or files, you have at least two options.

One would be to use Beaker’s programmatic API to fetch the object from the cache. You’ll have to provide a creation-function to store objects in the cache, and call namespace.get() to get the data.

If your code is only fetching the objects from the cache, then another way, which is cleaner, is to encapsulate the object-fetching in a function, and use @cache.region. Because Beaker uses positional arguments in the cache key, the cache will associate the value for file/object “XYZ” with the key for “XYZ” — which is exactly what you want.

from beakercache import cache

@cache.region("files")
def readfile(filepath):
    """Return the contents of an import-file."""

    with open(filepath) as f:
        rawdata = f.read()
    return rawdata

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.