Pattern: Preforking
Explanation
This pattern harks back to the Process Per Connection architecture we saw a few chapters back.
This one also leans on processes as its means of parallelism, but rather than forking a child process for each incoming connection, it forks a bunch of processes when the server boots up, before any connections arrive.
Let’s review the workflow:
- Main server process creates a listening socket.
- Main server process forks a horde of child processes.
- Each child process accepts connections on the shared socket and handles them independently.
- Main server process keeps an eye on the child processes.
The important concept is that the main server process opens the listening socket, but doesn’t accept connections on it. It then forks a predefined number of child processes, each of will have a copy of the listening socket. The child processes then each call accept
on the listening socket, taking the parent process out of the equation.
The best part about this is that we don’t have to worry about load balancing or synchronizing connections across our child processes because the kernel handles that for us. Given more than one process trying to accept
a connection on different copies of the same socket, the kernel balances the load and ensures that one, and only one, copy of the socket will be able to accept
any particular connection.
Implementation
require 'socket'
require_relative '../command_handler'
module FTP
class Preforking
CRLF = "\r\n"
CONCURRENCY = 4
def initialize(port = 21)
@control_socket = TCPServer.new(port)
trap(:INT) { exit }
end
def gets
@client.gets(CRLF)
end
def respond(message)
@client.write(message)
@client.write(CRLF)
end
def run
child_pids = []
CONCURRENCY.times do
child_pids << spawn_child
end
trap(:INT) {
child_pids.each do |cpid|
begin
Process.kill(:INT, cpid)
rescue Errno::ESRCH
end
end
exit
}
loop do
pid = Process.wait
$stderr.puts "Process #{pid} quit unexpectedly"
child_pids.delete(pid)
child_pids << spawn_child
end
end
def spawn_child
fork do
loop do
@client = @control_socket.accept
respond "220 OHAI"
handler = CommandHandler.new(self)
loop do
request = gets
if request
respond handler.handle(request)
else
@client.close
break
end
end
end
end
end
end
end
server = FTP::Preforking.new(4481)
server.run
This implementation is notably different from the three we’ve looked at thus far. Let’s talk about in two chunks, starting at the top.
def run
child_pids = []
CONCURRENCY.times do
child_pids << spawn_child
end
trap(:INT) {
child_pids.each do |cpid|
begin
Process.kill(:INT, cpid)
rescue Errno::ESRCH
end
end
exit
}
loop do
pid = Process.wait
$stderr.puts "Process #{pid} quit unexpectedly"
child_pids.delete(pid)
child_pids << spawn_child
end
end
This method begins by invoking the spawn_child
method a number of times, based on the number stored in CONCURRENCY
. The spawn_child
method (more on it below) will actually fork
a new process and return its unique process id (pid).
After spawning the children, the parent process defines a signal handler for the INT
signal. This is the signal that your process receives when you type Ctrl-C
, for instance. This bit of code simply forwards an INT
signal received by the parent to its child processes. Remember that the child processes exist independently of the parent and are happy to live on even if the parent process dies. As such, it’s important for a parent process to clean up their child processes before exiting.
After signal handling, the parent process enters a loop around Process.wait
. This method will block until a child process exits. It returns the pid of the exited child. Since there’s no reason for the child processes to exit, we assume it’s an anomaly. We print a message on STDERR and spawn a new child to take its place.
Some preforking servers, notably Unicorn, have the parent process take a more active role in monitoring its children. For example, the parent may look to see if any of the children are taking a long time to process a request. In that case the parent process will forcefully kill the child process and spawn a new one in its place.
def spawn_child
fork do
loop do
@client = @control_socket.accept
respond "220 OHAI"
handler = CommandHandler.new(self)
loop do
request = gets
if request
respond handler.handle(request)
else
@client.close
break
end
end
end
end
The core of this method should be familiar. This time it’s wrapped in a fork
and a loop
. So a new child process is forked before calling accept
. The outermost loop ensures that as each connection is handled and closed, a new connection is handled. In this way each child process will be in its own accept loop.
Considerations
There are several things at play that make this a great pattern.
Compared to the similar Process Per Connection architecture, Preforking doesn’t have to pay the cost of doing a fork
during each connection. Forking a process isn’t a cheap operation, and in Process Per Connection, every single connection must begin with paying that cost.
As hinted earlier, this pattern prevents too many processes from being spawned, because they’re all spawned beforehand.
One advantage that this pattern has over a similar threaded pattern is complete separation. Since each process has its own copy of everything, including the Ruby interpreter, a failure in one process will not affect any other processes. Since threads share the same process and memory space, a failure in one thread may affect other threads unpredictably.
A disadvantage of using Preforking is that forking more processes means that your server will consume more memory. Processes don’t come cheap. Given that each forked process gets a copy of everything, you can expect your memory usage to increase by up to 100% of the parent process size on each fork.
In this way a 100MB process will occupy 500MB after forking four children. And this would allow only 4 concurrent connections.
I won’t harp this point too much here, but this code is really simple. There are a few concepts that need to be understood, but overall it’s simple, with little to worry about in the way of things going awry at runtime.