Hey there Rubyist, careful what you memoize

* An examination of one of the subtler caching bugs which rears its head in Ruby web apps and test suites, and how to fix it reliably. *

A few weeks ago I was brought onto a Rails project containing a considerable amount of legacy code written by developers whose experience with Ruby and Rails was fairly limited. As usual, my first action after cloning the codebase and bootstrapping the app was to run the test suite. With a sigh of relief when the entire collection of RSpec examples ran both quickly and without failures, I was ready to go. Onward to new features! With the confidence that major regressions would be exposed by test failures, I started getting down to work and making modifications to the code. And indeed, very soon a spec failed, surely caused by something I had broken unwittingly while feeling out an unfamiliar codebase…until repeated execution under different conditions showed that the spec was randomly passing and failing, even without my modifications to the app.

“Flickering tests:” never a fun type of bug to seek and eliminate, but it has to be done if a test suite is to provide any value as a guard against regressions. In this case, I found the ordering of tests which caused repeatable failures (by specifying the RSpec test-ordering value) and then tracked the problem down to a bit of code in one of the app’s models, using an anti-pattern I’ve encountered before in Rails projects: class-level memoization of ActiveRecord objects for caching query results. Until you’ve been burned by it once or twice, it might not be at all obvious why it’s dangerous to retain global/class references to objects pulled up by database queries in order to save the expense of executing the same queries repeatedly. The following examples are intended to expose the problem and offer some safe alternatives.

Naïve optimization

Let’s start with a simple bit of code which looks ripe for easy optimization. We have an ActiveRecord model called Wombat, and a class method to access a special one-off record, looking it up in the database by one of its attributes:

“What a waste of precious SQL server cycles!” you cry. “Every time the app calls Wombat.my_unique_wombat, we run the exact same query. We can avoid re-executing this code by memoizing the result of the method!” Enter version 2:

And what if we get even more clever, and create a general-purpose memoized name-based accessor method, which we can call like Wombat['specialguy'] and save queries for any arbitrary name with the help of a hash? (Note that in earlier versions of Rails you could achieve basically the same result with the memoize method, but it’s since been deprecated and subsequently removed.)

Beautiful, now there will be only a single SQL query for any given name. Awesome gains! What could possibly go wrong?

What could possibly go wrong

What we’re doing in the above examples is caching. Really really basic, no-frills, homespun caching. That’s the purpose of memoization, and for many cases it’s a good way to reduce the number of expensive operations your Ruby code performs or calls; but this is not one of those cases, and memoization is no replacement for a robust caching solution. Let’s see why.

Tests now share state. Remember what I said above about sporadic test failures? This is a great way to get those when running more than one test case/spec example at a time. This problem can occur as follows: the memoized  Wombat.my_unique_wombat method is called during one example; its result is now stored in a variable in memory (an instance variable on the Wombat class object); before a subsequent example begins, the test database is cleaned out (as should always happen between examples), but not the variables stored on class objects, which are still in memory; now when the same method ends up getting called in a future example, it returns the memoized result, which points to stale data – probably an ActiveRecord object with an ID that doesn’t even exist in the current state of the database. Maybe the later example fails right away, maybe only at a later point, depending on how your app interacts with that object. Maybe it doesn’t fail at all – this time. One thing is certain: it’s a royal PITA to track down and fix.

Part of the “cache” is now automatically wiped every time the app restarts. Data in memory is lost when shutting down and restarting appservers or background worker processes. You’re forced to wipe that data (unless you persist it in a real cache or datastore). For small amounts of data this might not be an issue, but when your app is serving a level of traffic where caching is critical to performance and uptime, you want to be able to control exactly when and how parts of the cache are cleared, independently of your server restarts.

Different processes can now get out of sync with each other. In a multi-server or multi-process setup – anything where multiple processes are running your app code simultaneously – you’re introducing the possibility, however slim, that different processes will store different results when each one runs the memoized method for the first time. The problem is especially acute if you’re memoizing anything involving a timestamp. Silent but deadly.

You still have to invalidate! An easily overlooked issue: if there’s ever any way for the memoized result to change (for instance, any attribute of the ‘specialguy’ Wombat record changes), it has to be un-memoized. In a normal caching setup, this is standard practice (achieved in Rails through Observers or similar patterns); but it seems to be easily overlooked with the lightweight solution of simple memoization…

Making it right

So if memoization isn’t a good option for caching an ActiveRecord lookup, what’s a better solution? Before we get ahead of ourselves, take a moment and ask yourself: does this value really need to be cached? As the above points demonstrate, any form of caching introduces a certain amount of complexity into your code (especially surrounding the question of when to invalidate, one of the Two Hard Things in CS) – is this development complexity worth it for your particular wombat case? For instance, do you have metrics from New Relic suggesting that this query makes a measurable difference in performance? Is it run so infrequently that it’s not worth caching, or perhaps so frequently that your database’s built-in query-caching mechanism picks it up automatically?

Let’s say you’ve decided that you have a case where it is worth it to cache your query result. The obvious thing to do is to use whatever real caching solution your app already has registered with Rails, for instance memcached in production and filesystem/memory in development mode. You might start out revising your find-by-name method like so:

Watch out though! Serializing and de-serializing Ruby objects automatically is version-dependent, and can bite you when changing versions. Generally it’s safer to avoid caching entire ActiveRecord objects, and instead to deal with only the subset of data you actually need, with methods you have control over. For instance, a lightweight extraction of important data to a simple hash:

Or better yet, just cache the properties you specifically want:

And don’t forget to use Rails.cache.delete whenever the cached data becomes invalid (or overwrite the existing data with Rails.cache.write). Catching every case here is critical if you don’t want your app relying on stale data, so make sure you’re prepared to deal with this added development overhead.

Finally, now that you’re using the Rails cache API instead of memoization, you can deal with the problem of tests polluting each other through shared state. Either turn off caching in your test environment (by adding config.cache_store = :null_store to config/environments/test.rb), or (my preference) use :memory_store in tests and make sure that the cache is cleared before every example, for instance for RSpec in spec/spec_helper.rb:

Now you’re ready to take on the world. Understanding when and how data travels between your app’s persistence layer and Ruby objects in memory is crucial to making smart decisions about caching.

Oh, and if you’re wondering what I did to fix the bug I mentioned at the beginning of this post? You guessed it: I removed the memoization and didn’t replace it with any other form of caching. It didn’t affect performance. Fin.

PS: It’s not just a Rails thing

Note that the problem of shared test state described above is not particular to ActiveRecord; it’s a potential risk any time you use memoization at the class level. Since class objects are not re-initialized between every spec example, you have to be aware that whenever you memoize the result of a class method, that code will execute only once per test run, which can lead to unpredictable consequences if there’s any reason that the result value might change between app restarts. Memoize wisely!

3 thoughts on “Hey there Rubyist, careful what you memoize

  1. Thanks for posting this but one of the problems is where you’re storing the data and not how you’re storing the data.

    Tests now share state.

    This has nothing to do with memoization and everything to do with global state, which is what using instance variables on classes is. While “caching” and “memoization” are sometimes used interchangeably, these days I use one word over the other based on the scope I’m talking about.

    In Rails the requests all share boundaries since you get an instance of the controllers and those instances have instances of models and so on and so forth. In my opinion, memoization that stays within the request’s boundaries is memoization. Memoization that spans request boundaries is a cache.

    Memoization is a great way to save on repetitive IO operations in a request. If you need something that multiple requests can benefit from I don’t think it should be stored on globals/constants (for the reasons you state with multiprocess servers getting out of sync with each other). It should be moved to an entirely separate service such as Memcached.

    Also, if you’re caching IO operations on constants/globals using a multiprocess server you’re going to have tons of redundancies in your cache which will then be made exponential based on how many processes you run. It’s a huge waste of memory.

    1. I think we’re 100% in agreement about when and how to use memoization vs a caching service. That’s why I specified that the problem is “class-level memoization,” not just memoization in general – essentially anything that involves global app state being used as a cache.

      Also good point about the extra memory usage this causes in a multiprocess setup, which can become a real concern if you’re using this anti-pattern for large collections of objects.

      1. Just like to say thanks for posting this. I was encountering a flickering test situation, and the root cause of that is because of class level memoization. I used the heuristics of looking for ||= operators in class methods. Thanks.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">