Protecting Data with Mutexes

Mutexes provide a mechanism for multiple threads to synchronize access to a critical portion of code. In other words, they help bring some order, and some guarantees, to the world of multi-threaded chaos.

Mutual exclusion

The name ‘mutex’ is shorthand for ‘mutual exclusion.’ If you wrap some section of your code with a mutex, you guarantee that no two threads can enter that section at the same time.

I’ll demonstrate first using the silly Array-pushing example from the previous chapter.

shared_array = Array.new
mutex = Mutex.new

10.times.map do
  Thread.new do
    1000.times do
      mutex.lock
      shared_array << nil
      mutex.unlock
    end
  end
end.each(&:join)

puts shared_array.size

This slight modification is guaranteed to produce the correct result, across Ruby implementations, every time. The only difference is the addition of the mutex.

$ ruby code/snippets/concurrent_array_pushing_with_mutex.rb
10000
$ jruby code/snippets/concurrent_array_pushing_with_mutex.rb
10000
$ rbx code/snippets/concurrent_array_pushing_with_mutex.rb
10000

Let’s step through the changes.

Besides creating the mutex with Mutex.new, the important bits are the lines above and below the <<. Here they are again:

      mutex.lock
      shared_array << nil
      mutex.unlock

The mutex sets up some boundaries for the thread. The first thread that hits this bit of code will lock the mutex. It then becomes the owner of that mutex.

Until the owning thread unlocks the mutex, no other thread can lock it, thus no other threads can enter that portion of code. This is where the guarantee comes from. The guarantee comes from the operating system itself, which backs the Mutex implementation (See pthread_mutex_init(3)).

In this program, since any thread has to lock the mutex before it can push to the Array, there’s a guarantee that no two threads will be performing this operation at the same time. In other words, this operation can no longer be interrupted before it’s completed. Once one thread begins pushing to the Array, no other threads will be able to enter that portion of code until the first thread is finished. This operation is now thread-safe.

Before I say anything else, I’ll show a more commonly used convenience method that Ruby’s Mutex offers:

mutex.synchronize do
  shared_array << nil
end

You pass a block to Mutex#synchronize. It will lock the mutex, call the block, and then unlock the mutex.

The contract

Notice that the mutex is shared among all the threads. The guarantee only works if the threads are sharing the same Mutex instance. In this way, when one thread locks a mutex, others have to wait for it to be unlocked.

I also want to point out that we didn’t modify the << method or the shared_array variable in any way in order to make this thread-safe.

So, if some other part of your code decides to do shared_array << item, but without wrapping it in the mutex, that may introduce a thread-safety issue. In other words, this contract is opt-in. There’s nothing that stops some other part of the code from updating a value without the Mutex.

The block of code inside of a Mutex#synchronize call is often called a critical section, pointing to the fact that this code accesses a shared resource and must be handled correctly.

Making key operations atomic

In the last chapter we looked at multi-step operations, specifically a check-then-set race condition. You can use a mutex to make this kind of operation thread-safe, and we’ll look at the effect of both updating values and reading values with a mutex.

We’ll be reusing the ecommerce order example from the last chapter. Let’s make sure that our customers are only charged once.

# This class represents an ecommerce order
class Order
  attr_accessor :amount, :status

  def initialize(amount, status)
    @amount, @status = amount, status
    @mutex = Mutex.new
  end

  def pending?
    status == 'pending'
  end

  # Ensure that only one thread can collect payment at a time
  def collect_payment
    @mutex.synchronize do
      puts "Collecting payment..."
      self.status = 'paid'
    end
  end
end

# Create a pending order for $100
order = Order.new(100.00, 'pending')

# Ask 5 threads to check the status, and collect
# payment if it's 'pending'
5.times.map do
  Thread.new do
    if order.pending?
      order.collect_payment
    end
  end
end.each(&:join)

The change here is in the Order#collect_payment method. Each Order now has its own Mutex, initialized when the object is created. The Mutex creates a critical section around the collection of the payment. This guarantees that only one thread can collect payment at a time. Let’s run this against our supported Rubies:

$ ruby code/snippets/concurrent_payment_with_mutex.rb
Collecting payment...
Collecting payment...
Collecting payment...

$ jruby code/snippets/concurrent_payment_with_mutex.rb
Collecting payment...
Collecting payment...
Collecting payment...
Collecting payment...
Collecting payment...

$ rbx code/snippets/concurrent_payment_with_mutex.rb
Collecting payment...
Collecting payment...
Collecting payment...

Huh? This didn’t help at all. We’re still seeing the incorrect result.

The reason is that we put the critical section in the wrong place. By the time the payment is being collected, multiple threads could already be past the ‘check’ phase of the ‘check-and-set’ race condition. In this way, you could have multiple threads queued up to collect payment, but they would do it one at a time.

We need to move the mutex up to a higher level so that both the ‘check’ AND the ‘set,’ in this case checking if the order is pending and collecting the payment, are inside the mutex.

Here’s another go with the mutex in a different spot.

# This class represents an ecommerce order
class Order
  attr_accessor :amount, :status

  def initialize(amount, status)
    @amount, @status = amount, status
  end

  def pending?
    status == 'pending'
  end
  
  def collect_payment
    puts "Collecting payment..."
    self.status = 'paid'
  end
end

# Create a pending order for $100
order = Order.new(100.00, 'pending')
mutex = Mutex.new

# Ask 5 threads to check the status, and collect
# payment if it's 'pending'
5.times.map do
  Thread.new do
    mutex.synchronize do
      if order.pending?
        order.collect_payment
      end
    end
  end
end.each(&:join)

The change here is that the mutex is now used down inside the block passed to Thread.new. It’s no longer present inside the Order object. The mutex is now a contract between the threads, rather than a contract enforced by the object.

But notice that the mutex now wraps the ‘check’ AND the collect_payment method call. This means that once one thread has entered that critical section, the others must wait for it to finish. So now you have a guarantee that only one thread will check that condition, find it to be true, and collect the payment. You have made this multi-step operation atomic.

Here are the results to prove it:

$ ruby code/snippets/concurrent_payment_with_looser_mutex.rb
Collecting payment...

$ jruby code/snippets/concurrent_payment_with_looser_mutex.rb
Collecting payment...

$ rbx code/snippets/concurrent_payment_with_looser_mutex.rb
Collecting payment...

Now the result is correct every time.

Mutexes and memory visibility

Now I want to throw another question at you. Should the same shared mutex be used when a thread tries to read the order status? In that case, another thread might end up using something like this:

status = mutex.synchronize { order.status }

if status == 'paid'
  # send shipping notification
end

If you’re setting a variable while holding a mutex, and other threads want to see the most current value of that variable, they should also perform the read while holding that mutex.

The reason for this is due to low-level details. The kernel can cache in, for instance, L2 cache before it’s visible in main memory. It’s possible that after the status has been set to ‘paid,’ by one thread, another thread could still see the Order#status as ‘pending’ by reading the value from main memory before the change has propagated there.

This generally isn’t a problem in a single-threaded program because race conditions like this don’t happen with one thread. In a multi-threaded program there’s no such guarantee. When one thread writes to memory, that operation may exist in cache before it’s written to main memory. If another thread accesses that memory, it will get a stale value.

The solution to this is something called a memory barrier. Mutexes are implemented with memory barriers, such that when a mutex is locked, a memory barrier provides the proper memory visibility semantics.

Going back to our ecommerce example for a moment:

# With this line, it's possible that another thread
# updated the status already and this value is stale
status = order.status

# With this line, it's guaranteed that this value is
# consistent with any changes in other threads
status = mutex.synchronize { order.status }

Scenarios around memory visibility are difficult to understand and reason about. That’s one reason other programming languages have defined something called a memory model (Here is Java’s memory model: http://www.cs.umd.edu/~pugh/java/memoryModel/, and here is the one for Golang: http://golang.org/ref/mem), a well-defined specification describing how and when changes to memory are visible in other threads.

Ruby has no such specification yet, so situations like this are tricky to reason about and may even yield different results with different runtimes. That being said, mutexes carry an implicit memory barrier. So, if one thread holds a mutex to write a value, other threads can lock the same mutex to read it and they will see the correct, most recent value.

Mutex performance

As with all things, mutexes provide a tradeoff. It may not be obvious from the previous section, but mutexes inhibit parallelism.

Critical sections of code that are protected by a mutex can only be executed by one thread at any given time. This is precisely why the GIL inhibits parallelism! The GIL is just a mutex, like the one you just saw, that wraps execution of Ruby code.

Mutexes provides safety where it’s needed, but at the cost of performance. Only one thread can occupy a critical section at any given time. When you need that guarantee, the tradeoff makes sense. Nothing is more important than making sure that your data is not corrupted through race conditions, but you want to restrict the critical section to be as small as possible, while still preserving the safety of your data. This allows as much code as possible to execute in parallel.

Given this rule, which of the following two examples is more correct?

require 'thread'
require 'net/http'

mutex = Mutex.new
@results = []

10.times.map do
  Thread.new do
    mutex.synchronize do
      response = Net::HTTP.get_response('dynamic.xkcd.com', '/random/comic/')
      random_comic_url = response['Location']

      @results << random_comic_url
    end
  end
end.each(&:join)

puts @results
require 'thread'
require 'net/http'

mutex = Mutex.new
threads = []
results = []

10.times do
  threads << Thread.new do
    response = Net::HTTP.get_response('dynamic.xkcd.com', '/random/comic/')
    random_comic_url = response['Location']

    mutex.synchronize do
      results << random_comic_url
    end
  end
end

threads.each(&:join)
puts results

Did you get it? The second one is more correct because it uses a finer-grained mutex. It protects the shared resource (@results), while still allowing the rest of the code to execute in parallel.

The first example defines the whole block as the critical section. In that example, since the HTTP request is inside the critical section, it can no longer be performed in parallel. Once one thread locks that section, other threads have to wait for their turn.

A coarse mutex is unnecessary in this case because the HTTP request doesn’t access anything shared between threads. Similarly, working with the response object doesn’t need to be in a critical section. The response object is created right in the current thread’s context, so we know it’s not shared.

However, the line that modifies @results does need to be in a critical section. That’s a shared variable that could be modified by multiple concurrent threads. If you had used Ruby’s Queue instead of Array, it would have provided the thread-safety guarantees, and you wouldn’t need the mutex here. But two threads modifying the same Array need to be synchronized.

In this case, the performance of the coarse-grained lock example will be similar to single-threaded performance. The performance of the fine-grained lock example is roughly 10x better given that the main bottleneck is the HTTP request.

The takeaway: put as little code in your critical sections as possible, just enough to ensure that your data won’t be modified by concurrent threads.

The dreaded deadlock

Deadlock is probably a term that you’ve encountered before. Whether or not it was in the context of multi-threading, the term probably carried some trepidation. It’s got a deadly kind of sound to it, a bit like ‘checkmate’. Deadlock isn’t unlike checkmate, it essentially means ‘game over’ for the system in question.

A deadlock occurs when one thread is blocked waiting for a resource from another thread (like blocking on a mutex), while this other thread is itself blocked waiting for a resource. So far, this is a normal situation, multiple threads are blocked waiting for mutexes, waiting for the GIL, etc. It becomes a deadlock if a situation arises where neither thread can move forward. In other words, the system comes to a place where no progress can happen.

The classic example of this involves two threads and two mutexes. Each thread acquires one mutex, then attempts to acquire the other. The first acquisition will go fine, then both threads will end up waiting forever for the other to release its mutex.

Let’s walk through an example.

Here’s our starting state, there are two threads and two mutexes.

Now each thread acquires one of the mutexes. Thread A acquired Mutex A, Thread B acquired Mutex B. Both mutexes are now in a locked state.

Now the deadlock occurs. Both threads attempt to acquire the other mutex. In both cases, since the mutex they’re attempting to acquire is already locked, they’re blocked until it becomes available. But since both threads are now blocked AND still holding their original mutexes, neither will ever wake up. This is a classic deadlock.

So, what can be done about this? I’ll introduce two possible solutions. The first is based on a method of Mutex you haven’t seen yet: Mutex#try_lock.

The try_lock method attempts to acquire the mutex, just like the lock method. But unlike lock, try_lock will not wait if the mutex isn’t available. If another thread already owns the mutex, try_lock will return false. If it successfully acquires the mutex, try_lock will return true.

This can help in the deadlock situation seen above. One way to avoid deadlock at the last step would have been to use the non-blockingthe non-blocking Mutex#try_lock. That way the threads wouldn’t be put to sleep. But that’s not enough on its own: if both threads had just looped on try_lock, they still would never be able to acquire the other mutex. Rather, when try_lock returns false, both threads should release their mutexes and return to the starting state. From there, they can start again and hope for a better outcome.

With this approach, it’s fine to use a blocking lock to get the first mutex, but try_lock should be used when acquiring the second mutex. If the second mutex is unavailable, that means another thread holds it and is probably trying to acquire the mutex you’re holding! In that case, both mutexes should be unlocked for the dance to begin again. Thanks to the non-deterministic nature of things, one thread should be able acquire both mutexes at some point.

The downside to this approach is that another kind of issue can arise: livelocking. A livelock is similar to a deadlock in that the system is not progressing, but rather than threads stuck sleeping, they would be stuck in some loop with each other with none progressing.

You can see that this is possible using the try_lock solution I outlined here. It’s possible that Thread A acquires Mutex A, Thread B acquires Mutex B, then both try_lock the other mutex, both fail, both unlock their first mutex, then both re-acquire their first mutex, etc., etc. In other words, it’s possible that the threads are stuck in constant action, rather than stuck waiting. So there’s a better solution.

A better solution is to define a mutex hierarchy. In other words, any time that two threads both need to acquire multiple mutexes, make sure they do it in the same order. This will avoid deadlock every time.

Harking back to our example with the diagrams, we already have hierarchical labeling of the mutexes. Mutex A is the obvious choice for a first mutex, and Mutex B the obvious choice for a second. If both threads need both, make sure they start with Mutex A. That way, if one thread acquires Mutex A, it’s guaranteed to have no chance to deadlock on Mutex B. It can do the work it needs to do, then release both mutexes, at which point Thread B can lock Mutex A, and continue down the chain.

Deadlocks can wreak havoc on your system. If a thread is stuck in deadlock, it could be hogging a valuable resource, which blocks still other threads. Be aware of situations where a thread may block on more than one resource, and watch out for deadlocks any time that a thread needs to acquire more than one mutex.