Processes Can Communicate
Up until now we’ve looked at related processes that share memory and share open resources. But what about communicating information between multiple processes?
This is part of a whole field of study called Inter-process communication (IPC for short). There are many different ways to do IPC but I’m going to cover two commonly useful methods: pipes and socket pairs.
Our First Pipe
A pipe is a uni-directional stream of data. In other words you can open a pipe, one process can ‘claim’ one end of it and another process can ‘claim’ the other end. Then data can be passed along the pipe but only in one direction. So if one process ‘claims’ the position of reader, rather than writer, it will not be able to write to the pipe. And vice versa.
Before we involve multiple processes let’s just look at how to create a pipe and what we get from that:
reader, writer = IO.pipe #=> [#<IO:fd 5>, #<IO:fd 6>]
IO.pipe
returns an array with two elements, both of which are IO
objects. Ruby’s amazing IO class is the superclass to File
, TCPSocket
, UDPSocket
, and others. As such, all of these resources have a common interface.
The IO
objects returned from IO.pipe
can be thought of something like anonymous files. You can basically treat them the same way you would a File
. You can call #read
, #write
, #close
, etc. But this object won’t respond to #path
and won’t have a location on the filesystem.
Still holding back from bringing in multiple processes let’s demonstrate communication with a pipe:
reader, writer = IO.pipe
writer.write("Into the pipe I go...")
writer.close
puts reader.read
outputs
Into the pipe I go...
Pretty simple right? Notice that I had to close the writer after I wrote to the pipe? That’s because when the reader calls IO#read
it will continue trying to read data until it sees an EOF (aka. end-of-file marker (http://en.wikipedia.org/wiki/End-of-file). This tells the reader that no more data will be available for reading.
So long as the writer is still open the reader might see more data, so it waits. By closing the writer before reading it puts an EOF on the pipe so the reader stops reading after it gets the initial data. If you skip closing the writer then the reader will block and continue trying to read indefinitely.
Pipes Are One-Way Only
reader, writer = IO.pipe
reader.write("Trying to get the reader to write something")
outputs
>> reader.write("Trying to get the reader to write something")
IOError: not opened for writing
from (irb):2:in `write'
from (irb):2
The IO
objects returned by IO.pipe
can only be used for uni-directional communication. So the reader can only read and the writer can only write.
Now let’s introduce processes into the mix.
Sharing Pipes
In the chapter on forking I described how open resources are shared, or copied, when a process forks a child. Pipes are considered a resource, they get their own file descriptors and everything, so they are shared with child processes.
Here’s a simple example of using a pipe to communicate between a parent and child process. The child indicates to the parent that it has finished an iteration of work by writing to the pipe:
reader, writer = IO.pipe
fork do
reader.close
10.times do
# heavy lifting
writer.puts "Another one bites the dust"
end
end
writer.close
while message = reader.gets
$stdout.puts message
end
outputs Another one bites the dust
ten times.
Notice that, like above, the unused ends of the pipe are closed so as not to interfere with EOF being sent. There’s actually one more layer when considering EOF now that two processes are involved. Since the file descriptors were copied there’s now 4 instances floating around. Since only two of them will be used to communicate the other 2 instances must be closed. Hence the extra instances of closing.
Since the ends of the pipe are IO
objects we can call any IO
methods on them, not just #read
and #write
. In this example I use #puts
and #gets
to read and write a String
delimited with a newline. I actually used those here to simplify one aspect of pipes: pipes hold a stream of data.
Streams vs. Messages
When I say stream I mean that when writing and reading data to a pipe there’s no concept of beginning and end. When working with an IO stream, like pipes or TCP sockets, you write your data to the stream followed by some protocol-specific delimiter. For example, HTTP uses a series of newlines to delimit the headers from the body.
Then when reading data from that IO stream you read it in one chunk at a time, stopping when you come across the delimiter. That’s why I used #puts
and #gets
in the last example: it used a newline as the delimiter for me.
As you may have guessed it’s possible to communicate via messages instead of streams. We can’t do it with pipe, but we can do it with Unix sockets. Without going into too much detail, Unix sockets are a type of socket that can only communicate on the same physical machine. As such it’s much faster than TCP sockets and is a great fit for IPC.
Here’s an example where we create a pair of Unix sockets that can communicate via messages:
require 'socket'
Socket.pair(:UNIX, :DGRAM, 0) #=> [#<Socket:fd 15>, #<Socket:fd 16>]
This creates a pair of UNIX sockets that are already connected to each other. These sockets communicate using datagrams, rather than a stream. In this way you write a whole message to one of the sockets and read a whole message from the other socket. No delimiters required.
Here’s a slightly more complex version of the pipe example where the child process actually waits for the parent to tell it what to work on, then it reports back to the parent once it’s finished the work:
require 'socket'
child_socket, parent_socket = Socket.pair(:UNIX, :DGRAM, 0)
maxlen = 1000
fork do
parent_socket.close
4.times do
instruction = child_socket.recv(maxlen)
child_socket.send("#{instruction} accomplished!", 0)
end
end
child_socket.close
2.times do
parent_socket.send("Heavy lifting", 0)
end
2.times do
parent_socket.send("Feather lifting", 0)
end
4.times do
$stdout.puts parent_socket.recv(maxlen)
end
outputs:
Heavy lifting accomplished!
Heavy lifting accomplished!
Feather lifting accomplished!
Feather lifting accomplished!
So whereas pipes provide uni-directional communication, a socket pair provides bi-directional communication. The parent socket can both read and write to the child socket, and vice versa.
Remote IPC?
IPC implies communication between processes running on the same machine. If you’re interested in scaling up from one machine to many machines while still doing something resembling IPC there are a few things to look into. The first one would simply be to communicate via TCP sockets. This option would require more boilerplate code than the others for a non-trivial system. Other plausible solutions would be RPC (remote procedure call), a messaging system like ZeroMQ, or the general body of distributed systems.
In the Real World
Both pipes and socket pairs are useful abstractions for communicating between processes. They’re fast and easy. They’re often used as a communication channel instead of a more brute force approach such as a shared database or log file.
As for which method to use: it depends on your needs. Keep in mind that pipes are uni-directional and socket pairs are bi-directional when weighing your decision.
For a more in-depth example have a look at the Spyglass Master class in the included Spyglass project. It uses a more involved example of the code you saw above where many child processes communicate over a single pipe with their parent process.
System Calls
Ruby’s IO.pipe
maps to pipe(2), Socket.pair
maps to socketpair(2). Socket.recv
maps to recv(2) and Socket.send
maps to send(2).