Threads of Execution

Creating new threads is easy:

Thread.new { "Did it!" }

But multi-threaded programming is not as simple as just spawning more threads. There are lots of questions to be answered, such as:

  • Where/when should threads be spawned?
  • How many threads?
  • What about thread safety?

Threads are a powerful concept, and like anything powerful, they also allow you to shoot yourself in the foot if you don’t know what you’re doing. Here I’ll give you an overview of exactly what threads offer you.

Shared address space

The most important concept to grasp is that threads have a shared address space. This is a fancy way of saying that multiple threads will share all of the same references in memory (ie. variables), as well as the AST (ie. the compiled source code).

This simple fact is what makes threads so powerful, and also what makes them difficult to work with. I’ve already given you an idea of why threads are good; here’s a simple program to illustrate their difficulty.

require 'thread'

class FileUploader
  def initialize(files)
    @files = files
  end

  def upload
    threads = []

    @files.each do |(filename, file_data)|
      threads << Thread.new {
        status = upload_to_s3(filename, file_data)
        results << status
      }
    end

    threads.each(&:join)
  end

  def results
    @results ||= Queue.new
  end

  def upload_to_s3(filename, file)
    # omitted
  end
end

uploader = FileUploader.new('boots.png' => '*pretend png data*', 'shirts.png' => '*pretend png data*')
uploader.upload

puts uploader.results.size

If you run your eye down this example program, it looks pretty innocent. The obvious thing that sticks out is that the #upload method spawns one thread for each file that it uploads to S3. Since a call to Thread.new returns immediately, both of those threads will be working concurrently.

The #upload method ends by calling #join on each of the threads, which will block until the thread finishes execution.

Unfortunately, this code is not thread-safe. It’s quite possible that one of the statuses in the @results array will be lost. I intentionally omitted the S3 upload implementation details to let you know that the thread-safety issue does not come from there. The problem is actually with the #results method.

This might not seem possible; we’re just using regular Ruby code here with a conditional assignment! But this operator can absolutely cause problems in a multi-threaded context. This chapter will help you understand why that’s so.

I’ll let this example sit with you for a minute, but I’ll come back and explain why this happens before you finish this chapter.

Many threads of execution

We’re used to looking at source code and seeing a sequential set of instructions. First, this method calls that one, then if this value is true, it executes this block of code, then… you get the idea. It’s natural for you to think about your code this way. This is typically the way that we write it.

When you trace a path through your code like this, you are tracing a thread of execution, a possible path that can be traversed through your code. It’s easy to grasp that there is more than one possible thread of execution. For example, if you pass in a different input, you may get a different output.

It’s harder to grasp that there can be multiple threads of execution operating on the same code at the same time. This is precisely the case in a multi-threaded environment. Multiple threads of execution can be traversing their own paths, all at the same time.

At that point, it’s no longer possible to step through these simultaneous threads of execution in any kind of predictable manner. It’s as if someone just changed the rules of physics on you: things that were previously absolute truths no longer hold true. This is because there’s a certain amount of randomness introduced in the way that threads are scheduled.

Thankfully, you are able to introduce some thread-aware guarantees into your code. This is the heart of thread safety. But let’s talk a bit more about this randomness first.

Native threads

All of the Ruby implementations studied in this book ultimately map one Ruby thread to one native, operating system thread.

Take this silly example, run against MRI:

100.times do
  Thread.new { sleep }
end

puts Process.pid
sleep

This creates 100 sleeping threads, then sleeps the main thread to prevent it from exiting. Now we can ask top(1) how many threads this process has:

$ top -l1 -pid 8409 -stats pid,th

PID   #TH
8409  102

Note that you should use your own pid (process id) to test this out.

So we created 100 new threads, the main thread counts as 1, and MRI maintains an internal thread for housekeeping, which adds up to 102. The important point is that when we create 100 threads, those 100 threads are handled directly by the operating system.

Non-deterministic context switching

This section has a fancy title. It refers to the work that’s done by your operating system thread scheduler. This is a part of your operating system itself, and you have no control over how it functions. This little ditty is responsible for scheduling all of the threads running on the system.

It has to ensure that if there are 453 threads running on the system, they all get fair access to the system resources. This is a very complicated piece of software, with many optimizations, but it all comes down to this: context switching.

In order to provide fair access, the thread scheduler can ‘pause’ a thread at any time, suspending its current state. Then it can unpause some other thread so it can have a turn using system resources. Then, at some point in the near future, the thread scheduler can unpause the original thread, restoring it to its previous state.

This is known as context switching, and there’s a certain degree of randomness to it. It’s possible for your thread to be interrupted at any time. Thus, there are primitives you can use to say, “Hey, thread scheduler, I’m doing something important here, don’t let anybody else cut in until I’m done.”

Context switching in practice

Now that you have a basic understanding of the inherent randomness, let’s take an inside look at what really happened during the example from the beginning of this chapter.

Here it is again for the sake of posterity:

require 'thread'

class FileUploader
  def initialize(files)
    @files = files
  end

  def upload
    threads = []

    @files.each do |(filename, file_data)|
      threads << Thread.new {
        status = upload_to_s3(filename, file_data)
        results << status
      }
    end

    threads.each(&:join)
  end

  def results
    @results ||= Queue.new
  end

  def upload_to_s3(filename, file)
    # omitted
  end
end

uploader = FileUploader.new('boots.png' => '*pretend png data*', 'shirts.png' => '*pretend png data*')
uploader.upload

puts uploader.results.size

The Queue class is a thread-safe data structure that ships with Ruby. More about it in the chapter on thread-safe data structures.

Remember, sometimes we may lose one of the statuses in @results. Here’s how.

First, we need to break apart the ||= statement. Remember that a thread can be interrupted at any time. Although the ||= operator can’t be broken down any further in your Ruby code, there are a lot of underlying operations that support it.

# This statement
@results ||= Queue.new
# when broken down, becomes something like
if @results.nil?
  temp = Queue.new
  @results = temp
end

This doesn’t break it down to the full extent of the underlying implementation (for that, you’d need to look right to the source of your Ruby implementation), but this breaks it down to essentials.

  1. If @results currently holds the value of nil.
  2. Get the return value of the Queue.new method.
  3. Assign that value to @results.

In the example above, the FileUploader is instantiated with two files to upload. Let’s walk through how that might play out from the perspective of two individual threads.

For the sake of simplicity, I’m going to assume that your code can only run on one CPU core, and that both threads have already been spawned. They’re now ready to start executing their block of code.

Now the main thread is blocking on the call to #join while the other two threads do their work. Let’s call them Thread A and Thread B.

First, Thread A performs its upload to S3 while Thread B is paused. It receives the status and is ready to push that into results.

Thread A checks the value of @results and finds that it’s currently nil. It gets as far as calling Queue.new, then the thread scheduler decides this is a good time for a context switch.

Thread A is now paused, while Thread B gets a turn. It performs its upload to S3 and receives its status. It then checks the value of @results and finds that it’s nil. Remember that Thread A never got a chance to assign its Queue back to @results.

So Thread B creates a Queue and assigns it to @results, then pushes its status into it. Thread B has done its work, so it terminates. Now Thread A is given priority again, so it continues exactly where it left off.

At this moment in time, @results is no longer nil. It holds the value of the Queue that Thread B assigned to it. But Thread A is already at step #3 in the ||= process. It already checked for nil and created the Queue. So it goes ahead with step #3 and assigns its Queue to @results.

Now Thread A pushes its status into results. Thread A is finished, so it terminates. Now the thread scheduler can start the main thread, which has been waiting for these threads to terminate.

The program ends with @results having been assigned twice, and ultimately holding just one value: the status of Thread A, in this case.

It’s entirely possible for the thread scheduler makes a different decision on a different run of this program. Sometimes, @results may hold only the status of Thread B, or it may hold the statuses from both threads.

The easy fix, in this case, is to not use ||=. Instead, instantiate @results in initialize, before any threads are spawned. It’s good practice to avoid lazy instantiation in the presence of multiple threads.

This course of events is what’s known as a ‘race condition.’ When you’re on the receiving end, this can be very hard to track down.

A race condition involves two threads racing to perform an operation on some shared state. In some cases, the race may result in the underlying data becoming incorrect, like in the example I laid out here. In some cases, the race may work out fine and produce the correct result. It’s inherently non-deterministic. Still other race conditions may only be exposed under heavy load, when concurrency is highest.

Race conditions occur because a given operation, @results ||= Queue.new in this case, is not atomic. If the ||= operation were atomic, there would be no race condition. An atomic operation is one which cannot be interrupted before it’s complete. Even in the case of a multi-step operation, with proper protection in place, it will be treated as a single operation, either succeeding or failing without exposing its state mid-operation. This is a key concept you’ll see in coming chapters.

Why is this so hard?

Gosh, this sure does look hard. Time for some good news.

Am I saying that any instance of ||= in Ruby isn’t thread-safe? Do you have to stop using ||=? Absolutely not.

I really wanted to drive home an example of how things can go wrong, and give you a basic understanding of the thread scheduler and its mechanisms. Ultimately, this stuff is hard because it’s unpredictable. As I said, you can no longer trace the execution of the program in a predictable way.

But, before we lose hope, I want to share a principle. Understanding this principle, and taking the necessary actions, would have provided the guarantee we were looking for in the example program. Indeed, it really informs thread-safe code at any level.

Any time that you have two or more threads trying to modify the same thing at the same time, you’re going to have issues. This is because the thread scheduler can interrupt a thread at any time.

This points to two obvious strategies for avoiding issues: 1) don’t allow concurrent modification, or 2) protect concurrent modification. We’ll talk much more about both of these strategies in the coming chapters.

Understanding this idea of non-determinism and multiple threads of execution is at the core of why multi-threaded programming is considered hard.

But, multi-threaded programming isn’t hard. Rather, it’s no harder than functional programming, or cryptography. These are difficult subjects, but not insurmountable. These are things that any programmer can and should come to understand with some education and experimentation.