Appendix: How Unicorn Reaps Worker Processes

Any investigation of Unix Programming in the Ruby language would be remiss without many mentions of the Unicorn web server. Indeed, the project has already been mentioned several times in this book.

What’s the big deal? Unicorn is a web server that attempts to push as much responsibility onto the kernel as it can. It uses lots of Unix Programming. The codebase is chock full of Unix Programming techniques.

Not only that, but it’s performant and reliable. It’s used by lots of big Ruby websites like Github and Shopify.

The point is, if this book has whet your appetite and you want to learn more about Unix Programming in Ruby you should plumb the depths of Unicorn. It may take you several trips into the belly of the mythical beast but you will come out with better understanding and new ideas.

Reaping What?

Before we dive into the code I’d like to provide a bit of context about how Unicorn works. At a very high level Unicorn is a pre-forking web server.

This means that you boot it up and tell it how many worker processes you would like it to have. It starts by initializing its network sockets and loading your application. Then it uses fork(2) to create the worker processes. It uses the master-worker pattern we mentioned in the chapter on forking.

The Unicorn master process keep a heartbeat on each of its workers and ensures they’re not taking too long to process requests. The code below is used when you tell the Unicorn master process to exit. As we covered in chapter (Forking) if a parent process doesn’t kill its children before it exits they will continue on without stopping.

So it’s important that Unicorn clean up after itself before it exits. The code below is invoked as part of Unicorn’s exit procedure. Before invoking this code it will send a QUIT signal to each of its worker process, instructing it to exit gracefully.

The code below is used by Unicorn (current as of v4.0.0) to clean up its internal representation of its workers and ensure that they all exited properly.

Let’s dive in.

# reaps all unreaped workers
def reap_all_workers
  begin
    wpid, status = Process.waitpid2(-1, Process::WNOHANG)
    wpid or return
    if reexec_pid == wpid
      logger.error "reaped #{status.inspect} exec()-ed"
      self.reexec_pid = 0 
      self.pid = pid.chomp('.oldbin') if pid 
      proc_name 'master'
    else
      worker = WORKERS.delete(wpid) and worker.close rescue nil 
      m = "reaped #{status.inspect} worker=#{worker.nr rescue 'unknown'}"
      status.success? ? logger.info(m) : logger.error(m)
    end 
  rescue Errno::ECHILD
    break
  end while true
end

We’ll take it one line at a time:

begin
  ...
end while true

The first thing that I want to draw your attention to is the fact that the begin block that’s started on the first line of this method actually starts an endless loop. There are others ways to write endless loops in Ruby, but the important part is to keep in mind that we are in an endless loop so we’ll need a hard return or a break in order to finish this method.

wpid, status = Process.waitpid2(-1, Process::WNOHANG)

This line should have some familiarity. We looked at Process.waitpid2 in the chapter on Process.wait.

There we saw that passing a valid pid as the first option would cause the Process.waitpid call to wait only for that pid. What happens when you pass -1 to Process.waitpid? We know that there are no processes with a pid less than 1, so…

Passing -1 waits for any child process to exit. It turns out that this is the default option to that method. If you don’t specify a pid then it uses -1 by default. In this case, since the author needed to pass something in for the second argument, the first argument couldn’t be left blank, so it was set to the default.

Hey, if you’re waiting on any child process why not use Process.wait2 then? I suspect that the author decided here, and I agree with him, that it was most readable to use a waitpid variation when specifying a value for the pid. As mentioned above the value specified is simply the default, but nonetheless it’s most salient to use waitpid if you’re specifying any value for the pid.

Remember Process::WNOHANG from before? When using this flag if there are no processes that have exited for us then it will not block and simply return nil.

wpid or return

This line may look a little odd but it’s actually a conditional return statement. If wpid is nil then we know that the last line returned nil. This would mean that there are no child processes that have exited returning their status to us.

If this is the case then this method will return and its job is done.

if reexec_pid == wpid
  logger.error "reaped #{status.inspect} exec()-ed"
  self.reexec_pid = 0 
  self.pid = pid.chomp('.oldbin') if pid 
  proc_name 'master'

I don’t want to spend much time talking about this bit. The ‘reexec’ stuff has to do with Unicorn internals, specifically how it handles zero-downtime restarts. Perhaps I can cover that process in a future report.

One thing that I will draw your attention to is the call to proc_name. This is similar to the procline method from the Resque chapter. Unicorn also has a method for changing the display name of the current process. A critical piece of communication with the user of your software.

else
  worker = WORKERS.delete(wpid) and worker.close rescue nil 

Unicorn stores a list of currently active worker processes in its WORKERS constant. WORKERS is a hash where the key is the pid of the worker process and the value is an instance of Unicorn::Worker.

So this line removes the worker process from Unicorn’s internal tracking list (WORKERS) and calls #close on the worker instance, which closes its no longer needed heartbeat mechanism.

m = "reaped #{status.inspect} worker=#{worker.nr rescue 'unknown'}"

These lines craft a log message based on the status returned from the Process.waitpid2 call.

The string is crafted by first inspecting the status variable. What does that look like? Something like this:

#<Process::Status: pid=32227,exited(0)>
# or
#<Process::Status: pid=32308,signaled(SIGINT=2)>

It includes the pid of the ended process, as well as the way it ended. In the first line the process exited itself with an exit code of 0. In the second line the process was killed with a signal, SIGINT in this case. So a line like that will be added to the Unicorn log.

The second part of the log line worker.nr is Unicorn’s internal representation of the worker’s number.

status.success? ? logger.info(m) : logger.error(m)

This line takes the crafted log message and sends it to the logger. It uses the success? method on the status object to log this message as at the INFO level or the ERROR level.

The success? method will only return true in one case, when the process exited with an exit code of 0. If it exited with a different code it will return false. If it was killed by a signal, it will return nil.

rescue Errno::ECHILD
  break

This is part of the top-level begin statement in this method. If this exception is raised then the endless loop that is this method breaks and it will return.

The Errno::ECHILD exception will be raised by Process.waitpid2 (or any of its cousins) if there are no child processes for the current processes. If that happens in this case then it means the job of this method is done! All of the child processes have been reaped. So it returns.

Conclusion

If this bit of code interested you and you want to learn more about Unix Programming in Ruby, Unicorn is a great resource. See the official site at http://unicorn.bogomips.org and go learn!