Writing Thread-safe Code

In this chapter, I’ll give you a set of guidelines to keep in mind so that the code you write remains thread-safe.

It’s presented as a list of things to watch out for. Next time you find yourself doing one of these things, think back to this chapter. Any guideline has exceptions, but it’s good to know when you’re breaking one, and why.

Let’s start with the overarching principle on which this chapter is based:

Idiomatic Ruby code is most often thread-safe Ruby code.

This might be obvious, but I point it out to let you know that there aren’t any special tricks to learn. Writing good, idiomatic Ruby code will lead to thread safety most of the time. Obviously, the definition of ‘good, idiomatic Ruby code’ is up for debate, but bear with me!

Avoid mutating globals

If you’ll recall from the chapter that introduced thread safety, ‘concurrent modification’ is the main thing that’s going to lead to safety issues in a multi-threaded context.

The most obvious case where this will happen is when sharing objects between threads. Given this, global objects should stick out like a sore thumb! Global objects are implicitly shared between all threads.

So this is inherently not thread-safe:

$counter = 0
puts "Hey threads, go ahead and increment me at will!"

There are two things to keep in mind here:

  1. Even globals can provide thread-safe guarantees.
  2. Any time there is only one shared instance (aka. singleton), it’s a global.

Let’s tackle the first one first.

Even globals can provide thread-safe guarantees

That is to say, global variables don’t necesarily have to be avoided. If, for some reason, you really need that global counter, you could do it like this:

require 'thread'

class Counter
  def initialize
    @counter = 0
    @mutex = Mutex.new
  end

  def increment
    @mutex.synchronize do
      @counter += 1
    end
  end
end

$counter = Counter.new

It’s a bit more code, but that’s the price you pay to ensure that data consistency is preserved. It’s worth it.

Anything where there is only one shared instance is a global

I bring this up because it’s important to do more than just search for Ruby variables beginning with a dollar sign before you can cross this item off of your list.

There are other things that fit this definition in Ruby:

  • Constants
  • The AST
  • Class variables/methods

These things don’t look the same as global variables, but they’re accessible from anywhere in your program, by any part of the code. Therefore, they’re global too.

Just like storing a counter in a global variable (that has no thread-safety guarantee) is not safe, the same is true if you store that counter in a class variable or constant. So, look for those instances too.

This is OK, because it doesn’t modify global state.

class RainyCloudFactory
  def self.generate
    cloud = Cloud.new
    cloud.rain!

    cloud
  end
end

This is not OK, because it does modify global state, in this case, a class variable.

class RainyCloudFactory
  def self.generate
    cloud = Cloud.new
    @@clouds << cloud

    cloud
  end
end

A slightly more nefarious example is the AST. By this I’m referring to the current set of program instructions that comprise your program. Ruby, being such a dynamic language, allows you to change this at runtime. I don’t imagine this would be a common problem, but I saw it come up as an issue with the kaminari rubygem. Some part of the code was defining a method dynamically, then calling alias_method with that method, then removing it.

There’s only one AST, shared between all active threads, so you can imagine how this played out in a multi-threaded environment. One thread defined the method, then aliased it. Another thread then took over and defined its version of the method, overwriting the one that was already in place. The original thread then went and removed that method. When the second thread went to alias the method, it was no longer there. Boom. NoMethodError.

Again, this has to be a rare example, but it’s good to keep in mind that modifying the AST at runtime is almost always a bad idea, especially when multiple threads are involved. When I say ‘runtime’, I mean during the course of the lifecycle of the application. In other words, it’s expected that the AST will be modified at startup time, most Ruby libraries depend on this behaviour in some way. However, in the case of a Rails application, once it’s been initialized, changes to the AST shouldn’t happen at runtime, just as it’s rare to require new files in the midst of a controller action.

So, if you’re just reading from a global, that’s fine. If there’s a well-defined understanding of how to use a global, and it’s protected from concurrent modification, that’s fine. Rails.logger comes to mind as a good example of this. But if a global seems like a convenient place to stash a shared object, make sure you think twice about that. It might not be the best place for it.

Create more objects, rather than sharing one

But sometimes you just need that global object. You really can’t avoid it. This is especially problematic when the thread safety of that object is questionable.

The most common example of this is a network connection. I’m not thinking of a one-off HTTP request, but a long-lived connection to a database or external service.

This is problematic because a long-lived connection is a stateful connection. Typically, when talking to a database, your code will make a request, then wait for a response. The underlying socket has no notion of the state of the program, and the thread scheduler provides no guarantees about which thread will receive the data first.

So, this leaves us in a situation where the database client library needs to jump through a lot of hoops to make sure that the right thread receives the right result, or…

The simpler solution is to create more connections. There are two useful concepts that you could use for this:

  1. Thread-locals
  2. Connection pools

Thread-locals

This name is wild with contradiction, depending on your perspective. A thread-local lets you define a variable that is global to the scope of the current thread. In other words, it’s a global variable that is locally scoped on a per-thread basis.

Here’s how you might use it to provide each thread with its own connection to a Redis database:

# Instead of
$redis = Redis.new
# use
Thread.current[:redis] = Redis.new

Then you can use Thread.current[:redis] wherever you would otherwise have used $redis. This is a bit hard to grok the first time. Even though you only call Redis.new in one place, each thread will execute it independently. So, if your program is running N threads, it will open N connections to Redis.

This example showed a Redis connection, but this same concept can be applied to other objects, too. It’s perfectly acceptable to tell users of your API that they should create one object for each thread, rather than trying to write difficult, thread-safe code that will increase your maintainenace costs.

This N:N connection mapping is fine for small numbers of threads, but gets out of hand when the number of threads starts to increase. For connections, a pool is often a better abstraction.

Resource pools

Going back to the Redis example, if you have N threads, you could use a pool of M connections to share among those threads, where M is less than N. This still ensures that your threads aren’t sharing a single connection, but doesn’t require each thread to have its own.

A pool object will open a number of connections, or in the more general sense, may allocate a number of any kind of resource that needs to be shared among threads. When a thread wants to make use of a connection, it asks the pool to check out a connection. The pool is responsible for keeping track of which connections are checked out and which are available, preserving thread safety. When the thread is done, it checks the connection back in to the pool.

Implementing a connection pool is a good exercise in thread-safe programming, you’ll probably need to make use of both thread-locals and mutexes to do it safely. The connection_pool rubygem provides a nice, bare-bones implementation that’s a good study in this area (http://github.com/mperham/connection_pool).

Avoid lazy loading

A common idiom in Ruby on Rails applications is to lazily load constants at runtime, using something similar to Ruby’s autoload.

For instance, if the Geocoder constant has not yet been loaded, the first time your application tries to execute Geocoder.geocode(request.remote_ip), Rails will look for a file named geocoder.rb, require it, then return to your code.

This is a bad idea in the presence of multiple threads. The simple reason is that autoload in MRI is not thread-safe. It is thread-safe in recent versions of JRuby, but the best practice is simply to eager load files before spawning worker threads.

This is done implicitly in Rails 4+, and can be enabled in Rails 3.x using the config.threadsafe! configuration setting.

Prefer data structures over mutexes

Mutexes are notoriously hard to use correctly. For better or worse, you have a lot of things to decide when using a mutex.

  • How coarse or fine should this mutex be?
  • Which lines of code need to be in the critical section?
  • Is a deadlock possible here?
  • Do I need a per-instance mutex? Or a global one?

This is just a sampling of questions you need to answer when you decide to use a mutex. For a programmer familiar with mutexes and with deep knowledge of the problem domain, these questions may be easy to answer.

However, in many cases, they’re not. Sometimes the rules of a system are not well-defined, programmers working on the project are not fluent with the usage of mutexes, or a plethora of other reasons.

Using a data structure removes a lot of these concerns. Rather than worrying about where to put the mutex, whether or not it can deadlock, you simply don’t need to create any mutexes in your code.

By leaning on a data structure, you remove the burden of correct synchronization from your code and depend on the semantics of the data structure to keep things consistent.

This only works if you choose not to share objects between threads directly. Rather than letting threads access shared objects and implementing the necessary synchronization, you pass shared objects through data structures. This ensures that only one thread could mutate an object at any given time.

Finding bugs

There may be times where your program is exhibiting strange behaviour. Despite following all the best practices, a thread-safety bug may have slipped in, or this bug may be traced to someone else’s code. Unfortunately, this kind of debugging can often feel like looking for a needle in a haystack.

Like most bugs, if you can reproduce the issue, you can almost certainly track it down and fix it. However, some thread-safety issues may appear in production under heavy load, but can’t be reproduced locally. In this case, there’s no better solution than grokking the code.

Now you’ve seen common problems, notably global references. That’s the best place to start. Look at the code and assume that 2 threads will be accessing it simulatneously. Step through the possible scenarios. It can be helpful to jot these things down somewhere. With some practice, this process becomes more natural and these patterns will jump out at you more readily.