Processes Have File Descriptors

In much the same way as pids represent running processes, file descriptors represent open files.

Everything is a File

A part of the Unix philosophy: in the land of Unix ‘everything is a file’. This means that devices are treated as files, sockets and pipes are treated as files, and files are treated as files.

Since all of these things are treated as files I’m going to use the word ‘resource’ when I’m talking about files in a general sense (including devices, pipes, sockets, etc.) and I’ll use the word ‘file’ when I mean the classical definition (a file on the file system).

Descriptors Represent Resources

Any time that you open a resource in a running process it is assigned a file descriptor number. File descriptors are NOT shared between unrelated processes, they live and die with the process they are bound to, just as any open resources for a process are closed when it exits. There are special semantics for file descriptor sharing when you fork a process, more on that later.

In Ruby, open resources are represented by the IO class. Any IO object can have an associated file descriptor number. Use IO#fileno to get access to it.

passwd = File.open('/etc/passwd')
puts passwd.fileno

outputs:

3

Any resource that your process opens gets a unique number identifying it. This is how the kernel keeps track of any resources that your process is using.

What happens when we have multiple resources open?

passwd = File.open('/etc/passwd')
puts passwd.fileno

hosts = File.open('/etc/hosts')
puts hosts.fileno

# Close the open passwd file. The frees up its file descriptor
# number to be used by the next opened resource.
passwd.close

null = File.open('/dev/null')
puts null.fileno

outputs:

3
4
3

There are two key takeaways from this example.

  1. File descriptor numbers are assigned the lowest unused value. The first file we opened, passwd, got file descriptor #3, the next open file got #4 because #3 was already in use.

  2. Once a resource is closed its file descriptor number becomes available again. Once we closed the passwd file its file descriptor number became available again. So when we opened the file at dev/null it was assigned the lowest unused value, which was then #3.

It’s important to note that file descriptors keep track of open resources only. Closed resources are not given a file descriptor number.

Stepping back to the kernel’s viewpoint again this makes a lot of sense. Once a resource is closed it no longer needs to interact with the hardware layer so the kernel can stop keeping track of it.

Given the above, file descriptors are sometimes called ‘open file descriptors’. This is a bit of misnomer since there is no such thing as a ‘closed file descriptor’. In fact, trying to read the file descriptor number from a closed resource will raise an exception:

passwd = File.open('/etc/passwd')
puts passwd.fileno
passwd.close
puts passwd.fileno

outputs:

3
-e:4:in `fileno': closed stream (IOError)

You may have noticed that when we open a file and ask for its file descriptor number the lowest value we get is 3. What happened to 0, 1, and 2?

Standard Streams

Every Unix process comes with three open resources. These are your standard input (STDIN), standard output (STDOUT), and standard error (STDERR) resources.

These standard resources exist for a very important reason that we take for granted today. STDIN provides a generic way to read input from keyboard devices or pipes, STDOUT and STDERR provide generic ways to write output to monitors, files, printers, etc. This was one of the innovations of Unix.

Before STDIN existed your program had to include a keyboard driver for all the keyboards it wanted to support! And if it wanted to print something to the screen it had to know how to manipulate the pixels required to do so. So let’s all be thankful for standard streams.

puts STDIN.fileno
puts STDOUT.fileno
puts STDERR.fileno

outputs:

0
1
2

That’s where those first 3 file descriptor numbers went to.

In the Real World

File descriptors are at the core of network programming using sockets, pipes, etc. and are also at the core of any file system operations.

Hence, they are used by every running process and are at the core of most of the interesting stuff you can do with a computer. You’ll see many more examples of how to use them in the following chapters or in the attached Spyglass project.

System Calls

Many methods on Ruby’s IO class map to system calls of the same name. These include open(2), close(2), read(2), write(2), pipe(2), fsync(2), stat(2), among others.