Appendix: How Resque Manages Processes
This section looks at how a popular Ruby job queue, Resque, effectively manages processes. Specifically it makes use of fork(2) to manage memory, not for concurrency or speed reasons.
The Architecture
To understand why Resque works the way it does we need a basic understanding of how the system works.
From the README:
Resque is a Redis-backed library for creating background jobs, placing those jobs on multiple queues, and processing them later.
The component that we’re interested in is the Resque worker. Resque workers take care of the ‘processing them later’ part. The job of a Resque worker is to boot up, load your application environment, then connect to Redis and try to reserve any pending background jobs. When it’s able to reserve one such job it works off the job, then goes back to step 1. Simple enough.
For an application of non-trivial size one Resque worker is not enough. So it’s very common to spin up multiple Resque workers in parallel to work off jobs.
Forking for Memory Management
Resque workers employ fork(2) for memory management purposes. Let’s have a look at the relevant bit of code (from Resque v1.18.0) eand then dissect it line by line.
if @child = fork
srand # Reseeding
procline "Forked #{@child} at #{Time.now.to_i}"
Process.wait(@child)
else
procline "Processing #{job.queue} since #{Time.now.to_i}"
perform(job, &block)
exit! unless @cant_fork
end
This bit of code is executed every time Resque works off a job.
If you’ve read through the Forking chapter then you’ll already be familiar with the if/else style here. Otherwise go read it now!
We’ll start by looking at the code inside the parent process (ie. inside the if block).
srand # Reseeding
This line is here simply because of a bug (http://redmine.ruby-lang.org/issues/4338) in a certain patchlevel of MRI Ruby 1.8.7.
procline "Forked #{@child} at #{Time.now.to_i}"
procline
is Resque’s internal way of updating the name of the current process. Remember we noted that you can change the name of the current process by setting $0
but Ruby doesn’t include a method for it?
This is Resque’s solution. procline
sets the name of the current process.
Process.wait(@child)
If you’ve read the chapter on Process.wait
then this line of code should be familiar to you.
The @child
variable was assigned the value of the fork
call. So in the parent process that will be the child pid. This line of code tells the parent process to block until the child is finished.
Now we’ll look at what happens in the child process.
procline "Processing #{job.queue} since #{Time.now.to_i}"
Notice that both the if
and else
block make a call to procline. Even though these two lines are part of the same logical construct they are being executed in two different processes. Since the process name is process-specific these two calls will set the name for the parent and child process respectively.
perform(job, &block)
Here in the child process is where the job is actually ‘performed’ by Resque.
exit! unless @cant_fork
Then the child process exits.
Why Bother?
As mentioned in the first paragraph of this chapter, Resque isn’t doing this to achieve concurrency or to make things faster. In fact, it adds an extra step to the processing of each job which makes the whole thing slower. So why go to the trouble? Why not just process job after job?
Resque uses fork(2) to ensure that the memory usage of its worker processes don’t bloat. Let’s review what happens when a Resque worker forks and how that affects the Ruby VM.
You’ll recall that fork(2) creates a new process that’s an exact copy of the original process. The original process, in this case, has preloaded the application environment and nothing else. So we know that after forking we’ll have a new process with just the application environment loaded.
Then the child process will go to the task of working off the job. This is where memory usage can go awry. The background job may require that image files are loaded into main memory for processing, or many ActiveRecord objects are fetched from the database, or any other operation that requires large amounts of main memory to be used.
Once the child process is finished with the job it exits, which releases all of its memory back to the OS to clean up. Then the original process can resume, once again with only the application environment loaded.
So each time after a job is performed by Resque you end up back at a clean slate in terms of memory usage. This means that memory usage may spike when jobs are being worked on, but it should always come back to that nice baseline.
Doesn’t the GC clean up for us?
Well, yes, but it doesn’t do a great job. It does an OK job. The truth is that MRI’s GC has a hard time releasing memory that it doesn’t need anymore.
When the Ruby VM boots up it is allocated a certain block of main memory by the kernel. When it uses up all that it has it needs to ask for another block of main memory from the kernel.
Due to numerous issues with Ruby’s GC (naive approach, disk fragmentation) it is rare that the VM is able to release a block of memory back to the kernel. So the memory usage of a Ruby process is likely to grow over time, but not to shrink. Now Resque’s approach begins to make sense!
If the Resque worker simply worked off each job as it became available then it wouldn’t be able to maintain that nice baseline level of memory usage. As soon as it worked on a job that required lots of main memory then that memory would be stuck with the worker process until it exited.
Even if subsequent jobs needed much less memory Ruby would have a hard time giving that memory back to the kernel. Hence, the worker processes would inevitably get bigger over time. Never shrinking.
Thanks to the power of fork(2) Resque workers are reliable and don’t need to be restarted after working a certain number of jobs.