Primer
This section will provide background on some key concepts used in the book. It’s definitely recommended that you read this before moving on to the meatier chapters.
Why Care?
The Unix programming model has existed, in some form, since 1970. It was then that Unix was famously invented at Bell Labs, along with the C programming language or framework. In the decades that have elapsed since then Unix has stood the test of time as the operating system of choice for reliability, security, and stability.
Unix programming concepts and techniques are not a fad, they’re not the latest popular programming language. These techniques transcend programming languages. Whether you’re programming in C, C++, Ruby, Python, JavaScript, Haskell, or [insert your favourite language here] these techniques WILL be useful.
This stuff has existed, largely unchanged, for decades. Smart programmers have been using Unix programming to solve tough problems with a multitude of programming languages for the last 40 years, and they will continue to do so for the next 40 years.
Harness the Power!
I’ll warn you now, the concepts and techniques described in this book can bring you great power. With this power you can create new software, understand complex software that is already out there, even use this knowledge to advance your career to the next level.
Just remember, with great power comes great responsibility. Read on and I’ll tell you everything you need to know to gain the power and avoid the pitfalls.
Overview
This book is not meant to be read as a reference manual. It’s more of a walkthrough. To get the most out of it you should read it sequentially, since each chapter builds on the last. Once you’re finished you can use the chapter headings to find information if you need a refresher.
This book contains many code examples. I highly recommend that you follow along with them by actually running them yourself in a Ruby interpreter. Playing with the code yourself and making tweaks will help the concepts sink in that much more.
Once you’ve read through the book and played with the examples I’m sure you’ll be wanting to get your hands on a real world project that’s a little more in depth. At that point have a look at the included Spyglass project.
Spyglass is a web server that was created specifically for inclusion with this book. It’s designed to teach Unix programming concepts. It takes the concepts you learn here and shows how a real-world project would put them to use. Have a look at the last chapter in this book for a deeper introduction.
System Calls
To understand system calls first requires a quick explanation of the components of a Unix system, specifically userland vs. the kernel.
The kernel of your Unix system sits atop the hardware of your computer. It’s a middleman for any interactions that need to happen with the hardware. This includes things like writing/reading from the filesystem, sending data over the network, allocating memory, or playing audio over the speakers. Given its power, programs are not allowed direct access to the kernel. Any communication is done via system calls.
The system call interface connects the kernel to userland. It defines the interactions that are allowed between your program and the computer hardware.
Userland is where all of your programs run. You can do a lot in your userland programs without ever making use of a system call: do mathematics, string operations, control flow with logical statements. But I’d go as far as saying that if you want your programs to do anything interesting then you’ll need to involve the kernel via system calls.
If you were a C programmer this stuff would probably be second nature to you. System calls are at the heart of C programming.
But I’m going to expect that you, like me, don’t have any C programming experience. You learned to program in a high level language. When you learned to write data to the filesystem you weren’t told which system calls make that happen.
The takeaway here is that system calls allow your user-space programs to interact indirectly with the hardware of your computer, via the kernel. We’ll be looking at common system calls as we go through the chapters.
Nomenclature, wtf(2)
One of the roadblocks to learning about Unix programming is where to find the proper documentation. Want to hear the kicker? It’s all available via Unix manual pages (manpages), and if you’re using a Unix based computer right now it’s already on your computer!
If you’ve never used manpages before you can start by invoking the command man man
from a terminal.
Perfect, right? Well, kind of. The manpages for the system call api are a great resource in two situations:
- you’re a C programmer who wants to know how to invoke a given system call, or
- you’re trying to figure out the purpose of a given system call
I’m going to assume we’re not C programmers here, so #1 isn’t so useful, but #2 is very useful.
You’ll see references throughout this text to things like this: select(2). This bit of text is telling you where you can find the manpage for a given system call. You may or may not know this, but there are many sections to the Unix manpages.
Here’s a look at the most commonly used sections of the manpages for FreeBSD and Linux systems:
- Section 1: General Commands
- Section 2: System Calls
- Section 3: C Library Functions
- Section 4: Special Files
So Section 1 is for general commands (a.k.a. shell commands). If I wanted to refer you to the manual page for the find
command I would write it like this: find(1). This tells you that there is a manual page for find
in section 1 of the manpages.
If I wanted to refer to the manual page for the getpid
system call I would write it like this: getpid(2). This tells you that there is a manual page for getpid
in section 2 of the manpages.
Why do manpages need multiple sections? Because a command may be available in more than one section, ie. available as both a shell command and a system call.
Take stat(1) and stat(2) as an example.
In order to access other sections of the manpages you can specify it like this on the command line:
$ man 2 getpid
$ man 3 malloc
$ man find # same as man 1 find
This nomenclature was not invented for this book, it’s a convention that’s used everywhere when referring to the manpages. So it’s a good idea to learn it now and get comfortable with seeing it.
Processes: The Atoms of Unix
Processes are the building blocks of a Unix system. Why? Because any code that is executed happens inside a process.
For example, when you launch ruby
from the command line a new process is created for your code. When your code is finished that process exits.
$ ruby -e "p Time.now"
The same is true for all code running on your system. You know that MySQL server that’s always running? That’s running in its own process. The e-reader software you’re using right now? That’s running in its own process. The email client that’s desperately trying to tell you you have new messages? You should ignore it by the way and keep reading! It also runs in its own process.
Things start to get interesting when you realize that one process can spawn and manage many others. We’ll be taking a look at that over the course of this book.