Thread safety
Right away, I want to acknowledge that the term ‘thread safety’ gets thrown around a lot. Unfortunately, it’s rarely clear what exactly thread safety is about. You’ll sometimes hear about a given library or piece of code being thread-safe or not thread-safe. But if it’s not thread-safe, what will happen?
Will your program crash? Will the server start on fire? Will subtle bugs be magically introduced at a slow but consistent rate, without any possibility of reproducing them?
What’s really at stake?
If your code is ‘thread-safe,’ that means that it can run in a multi-threaded context and your underlying data will be safe. By data, I’m not talking about what’s in your database; I’m talking about what values your program has stored in memory. Your data is what’s at stake.
When your code isn’t thread-safe, the worst that can happen is that your underlying data becomes incorrect, yet your program continues as if it were correct.
There are a few more ways that we can say that that might make it clearer.
- If your code is ‘thread-safe,’ that means that you can run your code in a multi-threaded context and your underlying data will be safe.
- If your code is ‘thread-safe,’ that means that you can run your code in a multi-threaded context and your underlying data remains consistent.
- If your code is ‘thread-safe,’ that means that you can run your code in a multi-threaded context and the semantics of your program are always correct.
What’s really at stake when your code isn’t thread-safe is your data.
Once again, let’s make this concrete with an example.
# This class represents an ecommerce order
Order = Struct.new(:amount, :status) do
def pending?
status == 'pending'
end
def collect_payment
puts "Collecting payment..."
self.status = 'paid'
end
end
# Create a pending order for $100
order = Order.new(100.00, 'pending')
# Ask 5 threads to check the status, and collect
# payment if it's 'pending'
5.times.map do
Thread.new do
if order.pending?
order.collect_payment
end
end
end.each(&:join)
This is a variant of what’s called the ‘check-then-set’ race condition. The name says it all. This race condition is manifested by code that first checks a condition, then does something to change its value.
At first glance, this code may look innocent. But here’s some sample output from running this code.
$ ruby code/snippets/concurrent_payment.rb
Collecting payment...
Collecting payment...
$ jruby code/snippets/concurrent_payment.rb
Collecting payment...Collecting payment...
Collecting payment...Collecting payment...
Collecting payment...
$ rbx code/snippets/concurrent_payment.rb
Collecting payment...Collecting payment...
Collecting payment...Collecting payment...
Yikes! Your customers won’t be happy if you’re charging multiple times for each order.
I encourage you to try out this sample code. You will continually get different, yet incorrect, results. This is a result of the check-then-set race condition. This is not Ruby’s fault; this is your fault for not properly synchronizing access to the order
.
Let’s quickly review what happened here. The problem here is similar to the issue we saw with the +=
operator in a previous chapter. We have a multi-step operation (checking the order status, then collecting payment and setting the status) that can be interrupted before it’s finished. As the results showed, it’s quite possible, even likely, that this multi-step operation will be interrupted, such that one thread progresses partway through the operation, then another thread does the same. The end result is that the really critical piece, the part that collects payment, is performed twice.
Here, Thread A checks the order.pending?
condition and finds it to be true. A context switch immediately takes place, pausing Thread A before it can collect payment. Then, Thread B finds the condition to be true and proceeds to collect payment. Once Thread A becomes active again, it will pick up right where it left off and collect payment again, charging the customer for a second time.
Key operations like this need to be made atomic. Your code needs to tell the thread scheduler that this multi-step operation should not be interrupted. You’ll see how that can be done in the next chapter.
The computer is oblivious
I want you to imagine this code as part of a larger ecommerce system. If that were the case, then this incorrect behaviour would seem to go unchecked. In other words, your code would have no idea that it’s going forward with the system in an incorrect state. From a human perspective, we can see that collecting payment twice is bad, but this particular class has no notion that things have gone wrong.
All this to say that when Ruby produced the incorrect result from the example above, it didn’t come with an exception, or a process aborting. The computer is unaware of thread-safety issues. The onus is on you to notice these problems and deal with them.
This is one of the hardest problems when it comes to thread safety. There are no exceptions raised or alarm bells rung when the underlying data is no longer correct. Even worse, sometimes it takes a heavy load to expose a race condition like this. Something might not be noticed during development, but then crop up during a critical time in production.
Is anything thread-safe by default?
At this point, you may be wondering: is anything thread-safe by default?
In Ruby, very few things are guaranteed to be thread-safe by default. Even compound operators, like +=
or ||=
, although they are a single operation in Ruby, are not a single atomic, thread-safe operation from the perspective of the underlying VM.
The same is true for core collection classes. Things like Array
and Hash
are not thread-safe by default. Let’s illustrate this with a silly example.
shared_array = Array.new
10.times.map do
Thread.new do
1000.times do
shared_array << nil
end
end
end.each(&:join)
puts shared_array.size
In this silly example you have 10 threads each appending 1000 elements to a shared Array
. In the end, the array should have 10 * 1000 = 10,000
elements.
$ ruby code/snippets/concurrent_array_pushing.rb
10000
$ jruby code/snippets/concurrent_array_pushing.rb
7521
$ rbx code/snippets/concurrent_array_pushing.rb
8541
Again we see an incorrect result with no exceptions raised by Ruby. This does not mean that you need to stop using <<
or Arrays, just that you need to be aware of what guarantees they provide.
The reason that the above code example was susceptible to issues is that multiple threads shared an object and attempted to update that object at the same time.
Remember that any concurrent modifications to the same object are not thread-safe. This includes things like adding an element to an Array
, or regular ol' assignment, any concurrent modification.
In any of these situations, your underlying data is not safe if that operation will be performed on the same region of memory by multiple threads.
This is not nearly as scary as it sounds. I’m only showing you one side of the equation.
If you read between the lines, you’ll see that these operations are fine to use in a threaded program, so long as you can guarantee that multiple threads won’t be performing the same modification to the same object at the same time.
These ‘guarantees’ are really the crux of making your programs thread-safe. The world of multi-threading is a world of chaos. Thankfully, you do have some mechanisms available to bring a bit of order to this chaos. The next few chapters will cover a few different ways you can do that.
The good news in all of this? Most of the time, just writing good, idiomatic Ruby will lead to thread-safe code.