Sockets Can Read
Thus far we’ve talked a lot about connections. Now we get to the really interesting part: how to pass data across socket connections. Unsurprisingly there is more than one way to read/write data when working with sockets, and on top of that, Ruby provides nice convenience wrappers for us.
This chapter will dive in to the different ways of reading data and when they’re appropriate.
Simple Reads
The simplest way to read data from a socket is using the read
method:
require 'socket'
Socket.tcp_server_loop(4481) do |connection|
# Simplest way to read data from the connection.
puts connection.read
# Close the connection once we're done reading. Lets the client
# know that they can stop waiting for us to write something back.
connection.close
end
If you run that example in one terminal and the following netcat
command in another terminal, you should see the output at the Ruby server this time:
$ echo gekko | nc localhost 4481
If you’ve worked with Ruby’s File
API then this code may look familiar. Ruby’s various socket classes, along with File
, share a common parent in IO
. All IO objects in Ruby (sockets, pipes, files, etc.) share a common interface supporting methods like read
, write
, flush
, etc.
Indeed, this isn’t an innovation on Ruby’s part. The underlying read(2), write(2), etc. system calls all function similarly with files, sockets, pipes, etc. This abstraction is built right into the core of the operating system itself. Remember, everything is a file.
It’s Never That Simple
This method of reading data is simple, but brittle. If you run the example code again against this netcat
command and leave it alone, the server will never finish reading the data and never exit:
$ tail -f /var/log/system.log | nc localhost 4481
The reason for this behaviour is something called EOF (end-of-file). It’s covered in detail in the next section. For now we’ll just play naive and look at the naive fix.
The gist of the issue is that tail -f
never finishes sending data. If there is no data left to tail, it waits until there is some. Since it leaves its pipe open to netcat
, then netcat
too will never finish sending data to the server.
The server’s call to read
will continue blocking until the client finishes sending data. In this case the server will wait…and wait…and wait… meanwhile it’s buffering whatever data it does receive in memory and not returning it to your program.
Read Length
One way around the above issue is to specify a minimum length to be read. That way, instead of continuing to read data until the client is finished you can tell the server to read
a certain amount of data, then return.
require 'socket'
one_kb = 1024 # bytes
Socket.tcp_server_loop(4481) do |connection|
# Read data in chunks of 1 kb.
while data = connection.read(one_kb) do
puts data
end
connection.close
end
This above example, when run along with the same command:
$ tail -f /var/log/system.log | nc localhost 4481
will actually have the server printing data while the netcat
command is still running. The data will be printed in one-kilobyte chunks.
The difference in this example is that we passed an integer to read
. This tells it to stop reading and return what it has only once it has read that amount of data. Since we still want to get all the data available, we just loop over that read
method calling it until it doesn’t return any more data.
Blocking Nature
A call to read
will always want to block and wait for the full length of data to arrive. Take our above example of reading one kilobyte at a time. After running it a few times, it should be obvious that if some amount of data has been read, but if that amount is less than one kilobyte, then read
will continue to block until one full kilobyte can be returned.
It’s actually possible to get yourself into a deadlock situation using this method. If a server attempts to read
one kilobyte from the connection while the client sends only 500 bytes and then waits, the server will continue waiting for that full kilobyte!
This can be remedied in two ways: 1) the client sends an EOF after sending its 500 bytes, 2) the server uses a partial read.
The EOF Event
When a connection is being read
from and receives an EOF event, it can be sure that no more data will be coming over the connection and it can stop reading. This is an important concept to understand for any IO operation.
But first, a quick bit of history: EOF stands for ‘end of file’. You might say “but we’re not dealing with files here…”. You’d be mostly right, but need to keep in mind that everything is a file.
You’ll sometimes see reference to the ‘EOF character’, but there’s really no such thing. EOF is not represented as a character sequence, EOF is more like a state event. When a socket has no more data to write, it can shutdown
or close
its ability to write any more data. This results in an EOF event being sent to the reader on the other end, letting it know that no more data will be sent.
So let’s bring this full circle and fix the issue we had where the client sent only 500 bytes of data while the server expected one kilobyte.
A remedy for this situation would be for the client to send their 500 bytes, then send an EOF event. The server receives this event and stops reading, even though it hasn’t reached its one kilobyte limit. EOF tells it that no more data is coming.
That’s the reason that this example works:
require 'socket'
one_kb = 1024 # bytes
Socket.tcp_server_loop(4481) do |connection|
# Read data in chunks of 1 kb.
while data = connection.read(one_kb) do
puts data
end
connection.close
end
given this client connection:
require 'socket'
client = TCPSocket.new('localhost', 4481)
client.write('gekko')
client.close
The simplest way for a client to send an EOF is to close its socket. If its socket is closed, it certainly won’t be sending any more data!
A quick reminder of the fact that EOF is aptly named. When you call File#read
it behaves just like Socket#read
. It will read data until there’s no more to read. Once it’s consumed the entire file, it receives an EOF event and returns the data it has.
Partial Reads
A few paragraphs back I mentioned the term ‘partial read’. That’s something that could have gotten us out of that last situation as well. Time to look at that.
The first method of reading data we looked at was a lazy method. When you call read
it waits as long as possible before returning data, either until it receives its minimum length or gets an EOF. There is an alternative method of reading that takes an opposite approach. It’s readpartial
.
Calls to readpartial
, rather than wanting to block, want to return available data immediately. When calling readpartial
you must pass an integer argument, specifying the maximum length. readpartial
will read up to that length. So if you tell it to read up to one kilobyte of data, but the client sends only 500 bytes, then readpartial
will not block. It will return that data immediately.
Running this server:
require 'socket'
one_hundred_kb = 1024 * 100
Socket.tcp_server_loop(4481) do |connection|
begin
# Read data in chunks of 1 hundred kb or less.
while data = connection.readpartial(one_hundred_kb) do
puts data
end
rescue EOFError
end
connection.close
end
along with this client:
$ tail -f /var/log/system.log | nc localhost 4481
will show that the server is actually streaming each bit of data as it becomes accessible, rather than waiting for one hundred kilobyte chunks. readpartial
will happily return less than its maximum length if the data is available.
In terms of EOF, readpartial
behaves differently than read
. Whereas read
simply returns when it receives EOF, readpartial
actually raises an EOFError
exception. Something to watch out for.
To recap, read
is lazy, waiting as long as possible to return as much data as possible back to you. Conversely, readpartial
is eager, returning data to you as soon as its available.
After we look at the basics of write
we’ll turn to buffers. At that point we get to answer some interesting questions like: How much should I try to read at once? Is it better to do lots of small reads or one big read?
System Calls From This Chapter
Socket#read
-> read(2). Behaves more like fread(3).Socket#readpartial
-> read(2).