Appendix: Preforking Servers

I’m glad you made it this far because this chapter may be the most action-packed in the whole book. Preforking servers bring together a lot of the concepts that are explained in this book into a powerful, highly-efficient approach to solving certain problems.

There’s a good chance that you’ve used either Phusion Passenger or Unicorn. Both of those servers, and Spyglass (the web server included with this book), are examples of preforking servers.

At the core of all these projects is the preforking model. There are a few things about preforking that make it special, here are 3:

  1. Efficient use of memory.
  2. Efficient load balancing.
  3. Efficient sysadminning.

We’ll look at each in turn.

Efficient use of memory

In the chapter on forking we discussed how fork(2) creates a new process that’s an exact copy of the calling (parent) process. This includes anything that the parent process had in memory at the time.

Loading a Rails App

On my Macbook Pro loading only Rails 3.1 (no libraries or application code) takes in the neighbourhood of 3 seconds. After loading Rails the process is consuming about 70MB of memory.

Whether or not these numbers are exactly the same on your machine isn't significant for our purposes. I'll be referring to these as a baseline in the following examples.

Preforking uses memory more efficiently than does spawning multiple unrelated processes. For comparison, this is like running Unicorn with 10 worker processes compared to running 10 instances of Mongrel (a non-preforking server).

Let’s review what will happen from the standpoint of processes, first looking at Mongrel, then at Unicorn, when we boot up 10 instances of each server.

Many Mongrels

Booting up 10 Mongrel processes in parallel will look about the same as booting up 10 Mongrel processes serially.

When booting them in parallel all 10 processes will be competing for resources from the kernel. Each will be consuming resources to load Rails, and each can be expected to take the customary 3 seconds to boot. In total, that’s 30 seconds. On top of that, each process will be consuming 70MB of memory once Rails has been loaded. In total, that’s 700MB of memory for 10 processes.

A preforking server can do better.

Many Unicorn

Booting up 10 Unicorn workers will make use of 11 processes. One process will be the master, babysitting the other worker processes, of which there are 10.

When booting Unicorn only one process, the master process, will load Rails. There won’t be competition for kernel resources.

The master process will take the customary 3 seconds to load, and forking 10 processes will be more-or-less instantaneous. The master process will be consuming 70MB of memory to load Rails and, thanks to copy-on-write, the child processes should not be using any memory on top of what the master was using.

The truth is that it does take some time to fork a process (it’s not instantaneous) and that there is some memory overhead for each child process. These values are negligible compared to the overhead of booting many Mongrels. Preforking wins.

Keep in mind that the benefits of copy-on-write are forfeited if you're running MRI. To reap these benefits you need to be using REE.

Efficient load balancing

I already highlighted the fact that fork(2) creates an exact copy of the calling process. This includes any file descriptors that the parent process has open.

The Very Basics of Sockets

Efficient load balancing has a lot to do with how sockets work. Since we're talking about web servers: sockets are important. They're at the very core of networking. As I hinted earlier: sockets and networking are a complex topic, too big to fit into this book. But you need to understand the very basic workflow in order to understand this next part.

Using a socket involves multiple steps: 1) A socket is opened and binds to a unique port, 2) A connection is accepted on that socket using accept(2), and 3) Data can be read from this connection, written to the connection, and ultimately the connection is closed. The socket stays open, but the connection is closed.

Typically this would happen in the same process. A socket is opened, then the process waits for connections on that socket. The connection is handled, closed, and the loop starts over again.

Preforking servers use a different workflow to let the kernel balance heavy load across the socket. Let’s look at how that’s done.

In servers like Unicorn and Spyglass the first thing that the master process does is open the socket, before even loading the Rails app. This is the socket that is available for external connections from web clients. But the master process does not accept connections. Thanks to the way fork(2) works, when the master process forks worker processes each one gets a copy of the open socket.

This is where the magic happens.

Each worker process has an exact copy of the open socket, and each worker process attempts to accept connections on that socket using accept(2). This is where the kernel takes over and balances load across the 10 copies of the socket. It ensures that one, and only one, process can accept each individual connection. Even under heavy load the kernel ensures that the load is balanced and that only one process handles each connection.

Compare this to how Mongrel achieves load balancing.

Given 10 unrelated processes that aren’t sharing a socket each one must bind to a unique port. Now a piece of infrastructure must sit in front of all of the Mongrel processes. It must know which port each Mongrel processes is bound to, and it must do the job of making sure that each Mongrel is handling only one connection at a time and that connections are load balanced properly.

Again, preforking wins both for simplicity and resource efficiency.

Efficient sysadminning

This point is less technical, more human-centric.

As someone administering a preforking server you typically only need to issue commands (usually signals) to the master process. It will handle keeping track of and relaying messages to its worker processes.

When administering many instances of a non-preforking server the sysadmin must keep track of each instance, adminster them separately and ensure that their commands are followed.

Basic Example of a Preforking Server

What follows is some really basic code for a preforking server. It can respond to requests in parallel using multiple processes and will leverage the kernel for load balancing. For a more involved example of a preforking server I suggest you check out the Spyglass source code (next chapter) or the Unicorn source code.

require 'socket'

# Open a socket.
socket = TCPServer.open('0.0.0.0', 8080)

# Preload app code.
# require 'config/environment'

# Forward any relevant signals to the child processes.
[:INT, :QUIT].each do |signal|
  Signal.trap(signal) {
    wpids.each { |wpid| Process.kill(signal, wpid) }
  }
end

# For keeping track of child process pids.
wpids = []

5.times {
  wpids << fork do
    loop {
      connection = socket.accept
      connection.puts 'Hello Readers!'
      connection.close
    }
  end
}

Process.waitall ```

You can consume it with something like nc(1) or telnet(1) to see it in action.

``` console
$ nc localhost 8080
$ telnet localhost 8080

Notice that I snuck something new into that one? We haven't seen Process.waitall yet, it appeared on the last line of the example code above.

Process.waitall is simply a convenience method around Process.wait. It runs a loop waiting for all child processes to exit and returns an array of process statuses. Useful when you don't actually want to do anything with the process status info, it just waits for the children to exit.