Processes Can Wait

In the examples of fork(2) up until now we have let the parent process continue on in parallel with the child process. In some cases this led to weird results, such as when the parent process exited before the child process.

That kind of scenario is really only suitable for one use case, fire and forget. It’s useful when you want a child process to handle something asynchronously, but the parent process still has its own work to do.

message = 'Good Morning'
recipient = 'tree@mybackyard.com'

fork do
  # In this contrived example the parent process forks a child to take
  # care of sending data to the stats collector. Meanwhile the parent
  # process has continued on with its work of sending the actual payload.

  # The parent process doesn't want to be slowed down with this task, and
  # it doesn't matter if this would fail for some reason.
  StatsCollector.record message, recipient
end

# send message to recipient

Babysitting

For most other use cases involving fork(2) you’ll want some way to keep tabs on your child processes. In Ruby, one technique for this is provided by Process.wait. Let’s rewrite our orphan-inducing example from the last chapter to perform with less surprises.

fork do
  5.times do
    sleep 1
    puts "I am an orphan!"
  end
end

Process.wait
abort "Parent process died..."

This time the output will look like:

I am an orphan!
I am an orphan!
I am an orphan!
I am an orphan!
I am an orphan!
Parent process died...

Not only that, but control will not be returned to the terminal until all of the output has been printed.

So what does Process.wait do? Process.wait is a blocking call instructing the parent process to wait for one of its child processes to exit before continuing.

Process.wait and Cousins

I mentioned something key in that last statement, Process.wait blocks until any one of its child processes exit. If you have a parent that’s babysitting more than one child process and you’re using Process.wait, you need to know which one exited. For this, you can use the return value.

Process.wait returns the pid of the child that exited. Check it out.

# We create 3 child processes.
3.times do
  fork do
    # Each one sleeps for a random amount of number less than 5 seconds.
    sleep rand(5)
  end
end
    
3.times do
  # We wait for each child process to exit and print the pid that
  # gets returned.
  puts Process.wait
end  

Communicating with Process.wait2

But wait! Process.wait has a cousin called Process.wait2!

Why the name confusion? It makes sense once you know that Process.wait returns 1 value (pid), but Process.wait2 returns 2 values (pid, status).

This status can be used as communication between processes via exit codes. In our chapter on Exit Codes we mentioned that you can use exit codes to encode information for other processes. Process.wait2 gives you direct access to that information.

The status returned from Process.wait2 is an instance of Process::Status. It has a lot of useful information attached to it for figuring out exactly how a process exited.

# We create 5 child processes.
5.times do
  fork do
    # Each generates a random number. If even they exit
    # with a 111 exit code, otherwise they use a 112 exit code.
    if rand(5).even?
      exit 111
    else
      exit 112
    end
  end
end
    
5.times do
  # We wait for each of the child processes to exit.
  pid, status = Process.wait2

  # If the child process exited with the 111 exit code
  # then we know they encountered an even number.
  if status.exitstatus == 111
    puts "#{pid} encountered an even number!"
  else
    puts "#{pid} encountered an odd number!"
  end
end

Communication between processes without the filesystem or network!

Waiting for Specific Children

But wait! The Process.wait cousins have two more cousins. Process.waitpid and Process.waitpid2.

You can probably guess what these do. They function the same as Process.wait and Process.wait2 except, rather than waiting for any child to exit they only wait for a specific child to exit, specified by pid.

favourite = fork do
  exit 77
end

middle_child = fork do
  abort "I want to be waited on!"
end

pid, status = Process.waitpid2 favourite
puts status.exitstatus

Although it appears that Process.wait and Process.waitpid provide different behaviour don't be fooled! They are actually aliased to the same thing. Both will accept the same arguments and behave the same.

You can pass a pid to Process.wait in order to get it to wait for a specific child, and you can pass -1 as the pid to Process.waitpid to get it to wait for any child process.

The same is true for Process.wait2 and Process.waitpid2.

Just like with Process.pid vs. $$ I think it's important that, as programmers, we use the provided tools to reveal our intent where possible. Although these methods are identical you should use Process.wait when you're waiting for any child process and use Process.waitpid when you're waiting for a specific process.

Race Conditions

As you look at these simple code examples you may start to wonder about race conditions.

What if the code that handles one exited process is still running when another child process exits? What if I haven’t gotten back around to Process.wait and another process exits? Let’s see:

# We create two child processes.
2.times do
  fork do
    # Both processes exit immediately.
    abort "Finished!"
  end
end

# The parent process waits for the first process, then sleeps for 5 seconds. 
# In the meantime the second child process has exited and is no 
# longer running.
puts Process.wait
sleep 5

# The parent process asks to wait once again, and amazingly enough, the second
# process' exit information has been queued up and is returned here.
puts Process.wait

As you can see this technique is free from race conditions. The kernel queues up information about exited processes so that the parent always receives the information in the order that the children exited.

So even if the parent is slow at processing each exited child it will always be able to get the information for each exited child when it’s ready for it.

Take note that calling any variant of Process.wait when there are no child processes will raise Errno::ECHILD. It's always a good idea to keep track of how many child processes you have created so you don't encounter this exception.

In the Real World

The idea of looking in on your child processes is at the core of a common Unix programming pattern. The pattern is sometimes called babysitting processes, master/worker, or preforking.

At the core of this pattern is the concept that you have one process that forks several child processes, for concurrency, and then spends its time looking after them: making sure they are still responsive, reacting if any of them exit, etc.

For example, the Unicorn web server (http://unicorn.bogomips.org) employs this pattern. You tell it how many worker processes you want it to start up for you, 5 for instance.

Then a unicorn process will boot up that will fork 5 child processes to handle web requests. The parent (or master) process maintains a heartbeat with each child and ensures that all of the child processes stay responsive.

This pattern allows for both concurrency and reliability. Read more about Unicorn in its Appendix at the end of the book.

For an alternative usage of this technique read through the Lookout class in the attached Spyglass project.

System Calls

Ruby’s Process.wait and cousins map to waitpid(2).