Processes Can Get Signals

In the last chapter we looked at Process.wait. It provides a nice way for a parent process to keep tabs on its child processes. However it is a blocking call: it will not return until a child process dies.

What’s a busy parent to do? Not every parent has the luxury of waiting around on their children all day. There is a solution for the busy parent! And it’s our introduction to Unix signals.

Trapping SIGCHLD

Let’s take a simple example from the last chapter and rewrite it for a busy parent process.

child_processes = 3
dead_processes = 0
# We fork 3 child processes.
child_processes.times do
  fork do
    # They sleep for 3 seconds.
    sleep 3
  end
end

# Our parent process will be busy doing some intense mathematics.
# But still wants to know when one of its children exits.

# By trapping the :CHLD signal our process will be notified by the kernel
# when one of its children exits.
trap(:CHLD) do
  # Since Process.wait queues up any data that it has for us we can ask for it
  # here, since we know that one of our child processes has exited.
  
  puts Process.wait
  dead_processes += 1
  # We exit explicitly once all the child processes are accounted for.
  exit if dead_processes == child_processes
end

# Work it.
loop do
  (Math.sqrt(rand(44)) ** 8).floor
  sleep 1
end

SIGCHLD and Concurrency

Before we go on I must mention a caveat. Signal delivery is unreliable. By this I mean that if your code is handling a CHLD signal while another child process dies you may or may not receive a second CHLD signal.

This can lead to inconsistent results with the code snippet above. Sometimes the timing will be such that things will work out perfectly, and sometimes you’ll actually ‘miss’ an instance of a child process dying.

This behaviour only happens when receiving the same signal several times in quick succession; you can always count on at least one instance of the signal arriving. This same caveat is true for other signals you handle in Ruby; read on to hear more about those.

To properly handle CHLD you must call Process.wait in a loop and look for as many dead child processes as are available, since you may have received multiple CHLD signals since entering the signal handler. But….isn’t Process.wait a blocking call? If there’s only one dead child process and I call Process.wait again how will I avoid blocking the whole process?

Now we get to the second argument to Process.wait. In the last chapter we looked at passing a pid to Process.wait as the first argument, but it also takes a second argument, flags. One such flag that can be passed tells the kernel not to block if no child has exited. Just what we need!

There’s a constant that represents the value of this flag, Process::WNOHANG, and it can be used like so:

Process.wait(-1, Process::WNOHANG)

Easy enough.

Here’s a rewrite of the code snippet from the beginning of this chapter that won’t ‘miss’ any child process deaths:

child_processes = 3
dead_processes = 0
# We fork 3 child processes.
child_processes.times do
  fork do
    # They sleep for 3 seconds.
    sleep 3
  end
end

# Sync $stdout so the call to #puts in the CHLD handler isn't
# buffered. Can cause a ThreadError if a signal handler is
# interrupted after calling #puts. Always a good idea to do
# this if your handlers will be doing IO.
$stdout.sync = true

# Our parent process will be busy doing some intense mathematics.
# But still wants to know when one of its children exits.

# By trapping the :CHLD signal our process will be notified by the kernel
# when one of its children exits.
trap(:CHLD) do
  # Since Process.wait queues up any data that it has for us we can ask for it
  # here, since we know that one of our child processes has exited.

  # We loop over a non-blocking Process.wait to ensure that any dead child
  # processes are accounted for.
  begin
    while pid = Process.wait(-1, Process::WNOHANG)
      puts pid
      dead_processes += 1      
    end
  rescue Errno::ECHILD
  end
end

loop do
  # We exit ourself once all the child processes are accounted for.
  exit if dead_processes == child_processes
  
  sleep 1
end

One more thing to remember is that Process.wait, even this variant, will raise Errno::ECHILD if no child processes exist. Since signals might arrive at any time it’s possible for the last CHLD signal to arrive after the previous CHLD handler has already called Process.wait twice and gotten the last available status. This asynchronous stuff can be mind-bending. Any line of code can be interrupted with a signal. You’ve been warned!

So you must handle the Errno::ECHILD exception in your CHLD signal handler. Also if you don’t know how many child processes you are waiting on you should rescue that exception and handle it properly.

Signals Primer

This was our first foray to Unix signals. Signals are asynchronous communication. When a process receives a signal from the kernel it can do one of the following:

  1. ignore the signal
  2. perform a specified action
  3. perform the default action

Where do Signals Come From?

Technically signals are sent by the kernel, just like text messages are sent by a cell phone carrier. But text messages have an original sender, and so do signals. Signals are sent from one process to another process, using the kernel as a middleman.

The original purpose of signals was to specify different ways that a process should be killed. Let’s start there.

Let’s start up two ruby programs and we’ll use one to kill the other.

For these examples we won't use irb because it defines its own signal handlers that get in the way of our demonstrations. Instead we'll just use the ruby program itself.

Give this a try: launch the ruby program without any arguments. Enter some code. Hit Ctrl-D.

This executes the code that you entered and then exits.

Start up two ruby processes using the technique mentioned above and we’ll kill one of them using a signal.

  1. In the first ruby session execute the following code:

    puts Process.pid
    sleep # so that we have time to send it a signal
    
  2. In the second ruby session issue the following command to kill the first session with a signal:

    Process.kill(:INT, <pid of first session>)
    

So the second process sent an “INT” signal to the first process, causing it to exit. “INT” is short for “INTERRUPT”.

The system default when a process receives this signal is that it should interrupt whatever it’s doing and exit immediately.

The Big Picture

Below is a table showing signals commonly supported on Unix systems. Every Unix process will be able to respond to these signals and any signal can be sent to any process.

When naming signals the SIG portion of the name is optional. The Action column in the table describes the default action for each signal:

Term
means that the process will terminate immediately
Core
means that the process will terminate immediately and dump core (stack trace)
Ign
means that the process will ignore the signal
Stop
means that the process will stop (ie pause)
Cont
means that the process will resume (ie unpause)
  Signal	  Value	    Action   Comment
  -------------------------------------------------------------------------
  SIGHUP	     1	     Term    Hangup detected on controlling terminal 
                                          or death of controlling process
  SIGINT	     2	     Term    Interrupt from keyboard
  SIGQUIT     3	     Core    Quit from keyboard
  SIGILL          4	     Core    Illegal Instruction
  SIGABRT	     6	     Core    Abort signal from abort(3)
  SIGFPE          8	     Core    Floating point exception
  SIGKILL	     9	     Term    Kill signal
  SIGSEGV	    11	     Core    Invalid memory reference
  SIGPIPE	    13	     Term    Broken pipe: write to pipe with no readers
  SIGALRM	    14	     Term    Timer signal from alarm(2)
  SIGTERM	    15	     Term    Termination signal
  SIGUSR1	 30,10,16    Term    User-defined signal 1
  SIGUSR2	 31,12,17    Term    User-defined signal 2
  SIGCHLD	 20,17,18    Ign     Child stopped or terminated
  SIGCONT	 19,18,25    Cont    Continue if stopped
  SIGSTOP	 17,19,23    Stop    Stop process
  SIGTSTP	 18,20,24    Stop    Stop typed at tty
  SIGTTIN	 21,21,26    Stop    tty input for background process
  SIGTTOU	 22,22,27    Stop    tty output for background process
  
  The signals SIGKILL and SIGSTOP cannot be trapped, blocked, or ignored.

This table might seem a bit out of left field, but it gives you a rough idea of what to expect when you send a certain signal to a process. You can see that, by default, most of the signals terminate a process.

It’s interesting to note the SIGUSR1 and SIGUSR2 signals. These are signals whose action is meant specifically to be defined by your process. We’ll see shortly that we’re free to redefine any of the signal actions that we please, but those two signals are meant for your use.

Redefining Signals

Let’s go back to our two ruby sessions and have some fun.

  1. In the first ruby session use the following code to redefine the behaviour of the INT signal:

    puts Process.pid
    trap(:INT) { print "Na na na, you can't get me" }
    sleep # so that we have time to send it a signal
    

    Now our process won’t exit when it receives the INT signal.

  2. In the second ruby session issue the following command and notice that the first process is taunting us!

    Process.kill(:INT, <pid of first session>)
    
  3. You can try using Ctrl-C to kill that first session, and notice that it responds the same!

  4. But as the table said there are some signals that cannot be redefined. SIGKILL will show that guy who’s boss.

    Process.kill(:KILL, <pid of first session>)
    

Ignoring Signals

  1. In the first ruby session use the following code:

    puts Process.pid
    trap(:INT, "IGNORE")
    sleep # so that we have time to send it a signal
    
  2. In the second ruby session issue the following command and notice that the first process isn’t affected.

    Process.kill(:INT, <pid of first session>)
    

    The first ruby session is unaffected.

Signal Handlers are Global

Signals are a great tool and are the perfect fit for certain situations. But it’s good to keep in mind that trapping a signal is a bit like using a global variable, you might be overwriting something that some other code depends on. And unlike global variables signal handlers can’t be namespaced.

So make sure you read this next section before you go and add signal handlers to all of your open source libraries :)

Being Nice about Redefining Signals

There is a way to preserve handlers defined by other Ruby code, so that your signal handler won’t trample any other ones that are already defined. It looks something like this:

trap(:INT) { puts 'This is the first signal handler' }

old_handler = trap(:INT) {
  old_handler.call
  puts 'This is the second handler'
  exit
}
sleep 5 # so that we have time to send it a signal

Just send it a Ctrl-C to see the effect. Both signal handlers are called.

Now let’s see if we can preserve the system default behaviour. Hit the code below with a Ctrl-C.

system_handler = trap(:INT) {
  puts 'about to exit!'
  system_handler.call
}
sleep 5 # so that we have time to send it a signal

:/ It blew up that time. So we can’t preserve the system default behaviour with this technique, but we can preserve other Ruby code handlers that have been defined.

In terms of best practices your code probably shouldn't define any signal handlers, unless it's a server. As in a long-running process that's booted from the command line. It's very rare that library code should trap a signal.

# The 'friendly' method of trapping a signal.

old_handler = trap(:QUIT) {
  # do some cleanup
  puts 'All done!'
  
  old_handler.call if old_handler.respond_to?(:call)
}

This handler for the QUIT signal will preserve any previous QUIT handlers that have been defined. Though this looks ‘friendly’ it’s not generally a good idea. Imagine a scenario where a Ruby server tells its users they can send it a QUIT signal and it will do a graceful shutdown. You tell the users of your library that they can send a QUIT signal and it will draw an ASCII rainbow. Now if a user sends the QUIT signal both handlers will be invoked. This violates the expectations of both libraries.

Whether or not you decide to preserve previously defined signal handlers is up to you, just make sure you know why you’re doing it. If you simply want to wire up some behaviour to clean up resources before exiting you can use an at_exit hook, which we touched on in the chapter about exit codes.

When Can’t You Receive Signals?

Your process can receive a signal anytime. That’s the beauty of them! They’re asynchronous.

Your process can be pulled out of a busy for-loop into a signal handler, or even out of a long sleep. Your process can even be pulled from one signal handler to another if it receives one signal while processing another. But, as expected, it will always go back and finish the code in all the handlers that are invoked.

In the Real World

With signals, any process can communicate with any other process on the system, so long as it knows its pid. This makes signals a very powerful communication tool. It’s common to send signals from the shell using kill(1).

In the real world signals are mostly used by long running processes like servers and daemons. And for the most part it will be the human users who are sending signals rather than automated programs.

For instance, the Unicorn web server (http://unicorn.bogomips.org) responds to the INT signal by killing all of its processes and shutting down immediately. It responds to the USR2 signal by re-executing itself for a zero-downtime restart. It responds to the TTIN signal by incrementing the number of worker processes it has running.

See the SIGNALS file included with Unicorn for a full list of the signals it supports and how it responds to them.

The memprof project has a interesting example of being a friendly citizen when handling signals.

System Calls

Ruby’s Process.kill maps to kill(2), Kernel#trap maps roughly to sigaction(2). signal(7) is also useful.