Pattern: Preforking

Explanation

This pattern harks back to the Process Per Connection architecture we saw a few chapters back.

This one also leans on processes as its means of parallelism, but rather than forking a child process for each incoming connection, it forks a bunch of processes when the server boots up, before any connections arrive.

Let’s review the workflow:

  1. Main server process creates a listening socket.
  2. Main server process forks a horde of child processes.
  3. Each child process accepts connections on the shared socket and handles them independently.
  4. Main server process keeps an eye on the child processes.

The important concept is that the main server process opens the listening socket, but doesn’t accept connections on it. It then forks a predefined number of child processes, each of will have a copy of the listening socket. The child processes then each call accept on the listening socket, taking the parent process out of the equation.

The best part about this is that we don’t have to worry about load balancing or synchronizing connections across our child processes because the kernel handles that for us. Given more than one process trying to accept a connection on different copies of the same socket, the kernel balances the load and ensures that one, and only one, copy of the socket will be able to accept any particular connection.

Implementation

require 'socket'
require_relative '../command_handler'

module FTP
  class Preforking
    CRLF = "\r\n"
    CONCURRENCY = 4

    def initialize(port = 21)
      @control_socket = TCPServer.new(port)
      trap(:INT) { exit }
    end

    def gets
      @client.gets(CRLF)
    end

    def respond(message)
      @client.write(message)
      @client.write(CRLF)
    end

    def run
      child_pids = []

      CONCURRENCY.times do
        child_pids << spawn_child
      end

      trap(:INT) { 
        child_pids.each do |cpid|
          begin
            Process.kill(:INT, cpid)
          rescue Errno::ESRCH
          end
        end

        exit
      }

      loop do
        pid = Process.wait
        $stderr.puts "Process #{pid} quit unexpectedly"

        child_pids.delete(pid)
        child_pids << spawn_child
      end
    end

    def spawn_child
      fork do
        loop do
          @client = @control_socket.accept
          respond "220 OHAI"

          handler = CommandHandler.new(self)

          loop do
            request = gets

            if request
              respond handler.handle(request)
            else
              @client.close
              break
            end
          end
        end
      end
    end
  end
end

server = FTP::Preforking.new(4481)
server.run

This implementation is notably different from the three we’ve looked at thus far. Let’s talk about in two chunks, starting at the top.

    def run
      child_pids = []

      CONCURRENCY.times do
        child_pids << spawn_child
      end

      trap(:INT) {
        child_pids.each do |cpid|
          begin
            Process.kill(:INT, cpid)
          rescue Errno::ESRCH
          end
        end

        exit
      }

      loop do
        pid = Process.wait
        $stderr.puts "Process #{pid} quit unexpectedly"

        child_pids.delete(pid)
        child_pids << spawn_child
      end
    end

This method begins by invoking the spawn_child method a number of times, based on the number stored in CONCURRENCY. The spawn_child method (more on it below) will actually fork a new process and return its unique process id (pid).

After spawning the children, the parent process defines a signal handler for the INT signal. This is the signal that your process receives when you type Ctrl-C, for instance. This bit of code simply forwards an INT signal received by the parent to its child processes. Remember that the child processes exist independently of the parent and are happy to live on even if the parent process dies. As such, it’s important for a parent process to clean up their child processes before exiting.

After signal handling, the parent process enters a loop around Process.wait. This method will block until a child process exits. It returns the pid of the exited child. Since there’s no reason for the child processes to exit, we assume it’s an anomaly. We print a message on STDERR and spawn a new child to take its place.

Some preforking servers, notably Unicorn, have the parent process take a more active role in monitoring its children. For example, the parent may look to see if any of the children are taking a long time to process a request. In that case the parent process will forcefully kill the child process and spawn a new one in its place.

    def spawn_child
      fork do
        loop do
          @client = @control_socket.accept
          respond "220 OHAI"

          handler = CommandHandler.new(self)

          loop do
            request = gets

            if request
              respond handler.handle(request)
            else
              @client.close
              break
            end
          end
        end
      end

The core of this method should be familiar. This time it’s wrapped in a fork and a loop. So a new child process is forked before calling accept. The outermost loop ensures that as each connection is handled and closed, a new connection is handled. In this way each child process will be in its own accept loop.

Considerations

There are several things at play that make this a great pattern.

Compared to the similar Process Per Connection architecture, Preforking doesn’t have to pay the cost of doing a fork during each connection. Forking a process isn’t a cheap operation, and in Process Per Connection, every single connection must begin with paying that cost.

As hinted earlier, this pattern prevents too many processes from being spawned, because they’re all spawned beforehand.

One advantage that this pattern has over a similar threaded pattern is complete separation. Since each process has its own copy of everything, including the Ruby interpreter, a failure in one process will not affect any other processes. Since threads share the same process and memory space, a failure in one thread may affect other threads unpredictably.

A disadvantage of using Preforking is that forking more processes means that your server will consume more memory. Processes don’t come cheap. Given that each forked process gets a copy of everything, you can expect your memory usage to increase by up to 100% of the parent process size on each fork.

In this way a 100MB process will occupy 500MB after forking four children. And this would allow only 4 concurrent connections.

I won’t harp this point too much here, but this code is really simple. There are a few concepts that need to be understood, but overall it’s simple, with little to worry about in the way of things going awry at runtime.

Examples