Appendix: Thread-safety and Immutability

You learned earlier that the main thread-safety issue that’s going to come up is concurrent modification. When two threads are trying to modify an object at the same time, things get unpredictable.

But immutability provides a way around this. An immutable object is one which can’t be modified after it’s created. If an object can’t be modified, then, by definition, two threads can’t modify it at the same time. So, by definition, immutable objects are thread-safe!

On the surface, this sounds like an easy path to concurrency, but writing programs with no mutability is very difficult. Pure functional programming languages, like Haskell, implicitly support this programming model, and are notoriously hard to master. In Ruby, everything is mutable by default, so attempting to write fully immutable programs involves going against the grain and the language idioms.

But there’s no need to throw out the baby with the bath water. If writing fully immutable programs is difficult, there are certainly places for immutable objects or immutable data structures that can provide an easier path to thread safety.

Immutable Ruby objects

Immutability is actually supported in core Ruby using the Object#freeze method.

# This is a mutable Array
comics = []

# Appending to the array mutates it
comics << 'random'

# Freezing the Array makes it immutable
comics.freeze

comics << 'random'
#=> RuntimeError: can't modify frozen Array

The freeze method makes the Array immutable, but then it effectively becomes unusable. How do you append to an Array that can’t be updated?

The typical method signature for immutable objects is: methods that would typically mutate the object instead return a new version of the object with the mutation applied.

Here’s an example using the ‘hamster’ rubygem.

require 'hamster/vector'

mutable = Array.new
immutable = Hamster::Vector.new

mutable #=> []
mutable.push(nil)
mutable #=> [nil]

immutable #=> []
immutable.add(nil) #=> [nil]
immutable #=> []

# This is typical of immutable data structures,
# re-assign the reference to the result of an
# operation.
immutable = immutable.add(nil)
immutable #=> [nil]

Notice how the immutable object retained its original state even after an element was pushed onto it? Instead of mutating itself, its push method returned an updated version of itself containing the new element. It’s a sneaky way to avoid mutating objects!

If you want to use immutable data structures in your Ruby program, you’ll want to check out Hamster. For one, it’s got mature immutable implementations for Hash, List, Vector, and others. It doesn’t match Ruby’s API 100%, but it provides what makes sense for an immutable data structure. In terms of efficiencies, you might, at first glance, think there’s a lot duping of Ruby objects happening under the hood to create new versions of these data structures, but Hamster has a much more efficient implementation that will be friendly to the garbage collector. The README outlines the API and links to the paper describing the implementation: https://github.com/harukizaemon/hamster#readme.

Integrating immutability

You’ve had a quick look at how an immutable object works. Now how is it typically used? The simplest use case is this: when you need to share objects between threads, share immutable objects. If you need to share objects with other threads, it’s always preferrable to share an immutable object. If you share a mutable object with another thread, then you need to be concerned about thread safety, probably introducing synchronization.

This kind of complexity literally disappears if you share an immutable object. When passing an object to another thread, making it immutable is a simple win. But what about other situations?

Let’s look at a classic multi-threaded arrangement: a producer and consumer. This problem involves one thread distributing work to another group of threads, so you definitely need a thread-safe data structure.

Here’s how you might do it with a Hamster data structure.

require 'hamster/queue'

@queue = Hamster::Queue.new

10.times do
  @queue = @queue.enqueue(rand(100))
end

I’ve started with just the producer side of things because I see an issue already. In the examples from the beginning of the chapter, you updated the immutable object by re-assigning it to a new version of itself. That’s what’s happening here, but that’s not safe in the presence of multiple threads.

There’s no synchronization being used here, so it’s possible that between the time that the producer thread evaluates the right hand side of its assignment to get a new version of the queue and the time it actually assigns that value, another thread could have removed an element from the queue and updated the reference. When the producer thread completes its assignment, it could overwrite the change made by the consumer thread.

So we can’t get away from using some form of synchronization here. That’s generally the case with immutable objects. It’s very easy to pass out immutable objects to share, but if you need to have multiple threads modifying an immutable object you still need some form of synchronization.

In this case, immutable data structures work great with CAS operations. Here’s an updated version making use of both:

require 'hamster/queue'
require 'atomic'

@queue_wrapper = Atomic.new(Hamster::Queue.new)

30.times do
  @queue_wrapper.update { |queue|
    queue.enqueue(rand(100))
  }
end

consumers = []

3.times do
  consumers << Thread.new do
    10.times do
      number = nil

      @queue_wrapper.update { |queue|
        number = queue.head
        queue.dequeue
      }

      puts "The cubed root of #{number} is #{Math.cbrt(number)}"
    end
  end
end

consumers.each(&:join)

Notice how well these two approaches work together?

The immutable approach looked out of place before, but now, inside of the update block, it looks like regular mutable code. Remember the update block only cares about the return value, and the immutable operations are returning new versions of the queue to replace the existing version. Since the underlying queue is immutable, the operations in the block are idempotent and can be run any number of times.

I’ll take a minute to address the consumer side of the code, particularly how the dequeuing happens. Here it is again:

      @queue_wrapper.update { |queue|
        number = queue.head
        queue.dequeue
      }

With a Hamster::Queue, popping a value from the queue is a multi-step operation.

The dequeue method returns a new version of the queue with the first element removed. This is what you’ll want to assign back to the underlying queue reference. But you also need a way to retrieve that first element before updating the list. Hamster::Queue#head will give you that first element without modifying the queue. One you have a reference to head, you can try to update the queue with the new dequeued version.

Wrap up

Like almost everything else you’ve seen, this approach has its pros and cons. On the one hand, immutability is a nice guarantee to have, it’s the simplest path to thread safety when sharing objects. However, when you need to collect data from immutable objects, it sometimes requires some re-thinking and some new techniques.

In terms of performance, the comparisons are a little unfair. Ruby has no native immutable collections, so the native mutable collections are generally faster than what Hamster can offer. However, if you’re looking for the flexibility and guarantees that immutable collections offer, the tradeoff is a good one to make.

I’ll leave you with the 4 rules to safe concurrency again. Rule #3 is of particular relevance to this chapter.

The safest path to concurrency is:

  1. Don’t do it.
  2. If you must do it, don’t share data across threads.
  3. If you must share data across threads, don’t share mutable data.
  4. If you must share mutable data across threads, synchronize access to that data.