Exchanging Data

The previous section was all about establishing connections, connecting two endpoints together. While interesting in itself, you can’t actually do anything interesting without exchanging data over a connection. This section gets into that. By the end we’ll actually be able to wire up a server and client and have them talking to each other!

Before we dive in I’ll just stress that it can be very helpful to think of a TCP connection as a series of tubes connecting a local socket to a remote socket, along which we can send and receive chunks of data. The Berkeley Sockets API was designed such that that we could model the world like this and have everything work out.

In the real world all of the data is encoded as TCP/IP packets and may visit many routers and hosts on the way to its destination. It’s a bit of a crazy world and it’s good to keep that in mind when things aren’t working out, but thankfully, that crazy world is one that a lot of people worked very hard to cover up so we can stick to our simple mental model.

Streams

One more thing I need to drive home: the stream-based nature of TCP, something we haven’t talked about yet.

Way back at the start of the book when we created our first socket we passed an option called :STREAM which said that we wanted to use a stream socket. TCP is a stream-based protocol. If we had not passed the :STREAM option when creating our socket it simply wouldn’t be a TCP socket.

So what does that mean exactly? How does it affect the code?

First, I hinted at the term packets above. At the underlying protocol level TCP sends packets over the network.

But we’re not going to talk about packets. From the perspective of your application code a TCP connection provides an ordered stream of communication with no beginning and no end. There is only the stream.

Let’s illustrate this with some pseudo-code examples.

# This code sends three pieces of data over the network, one at a time.
data = ['a', 'b', 'c']

for piece in data
  write_to_connection(piece)
end

# This code consumes those three pieces of data in one operation.
result = read_from_connection #=> ['a', 'b', 'c']

The moral of the story here is that a stream has no concept of message boundaries. Even though the client sent three separate pieces of data, the server received them as one piece of data when it read them. It had no knowledge of the fact that the client sent the data in three distinct chunks.

Note that, although the message boundaries weren’t preserved, the order of the content on the stream was preserved.