• Stars
    star
    326
  • Rank 129,027 (Top 3 %)
  • Language
    C
  • Created over 7 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

How to write a UNIX shell, with a lot of background

How to write a (very basic) UNIX shell

UNIX shells are pretty much unique in the ease with which they let you manipulate the state of your system. Doing things which are trivial in a shell script requires a fair amount of code in most other languages, and that's probably in part why the shell has been such an enduring tool for the past few decades.

But such historical appreciation simply breeds complacency; it's time for something better than /bin/bash. Before writing it, though, it's probably useful to understand what bash is doing internally. That's what this guide is about.

What happens when you run a command

Ok before we get into this, let's talk about what bash (or sh) does when you run a simple command like ls.

$ ls                    # <- what's really happening here?
README.md               # <- and how did this get to the screen?
$

The terminal emulator and PTY devices

The first thing to talk about is how your keystrokes end up getting sent to bash. You're most likely running a terminal emulator, which is a fairly normal graphical program that owns the shell process; internally, it's set up like this:

                                              +----------------------------+
   +---------------------------+              |                            |
   |                           |  pty device  +-fd 0 (stdin)               |
   |  terminal emulator   fd N-+--------------+-fd 1 (stdout)   /bin/bash  |
   |  (e.g. xterm, iterm)      |              +-fd 2 (stderr)              |
   |                           |              |                            |
   +---------------------------+              +----------------------------+

The PTY device behaves like a bidirectional data pipe (a socket): if you write into one end, you can read from the other end. It has some other attributes that define its state, but fundamentally it's just a two-way pipe. So the terminal emulator listens for both keyboard events and for available data on the PTY device, and either writes or updates the screen accordingly.

bash, or whichever shell you're running, is started up with file descriptors 0, 1, and 2 mapped to the other end of the PTY device. UNIX convention dictates that processes read from fd 0 and write to fds 1 and 2, but the kernel really doesn't care what you do with any of them since they all point to the same IO resource. For example, we can read from stderr and write to stdin:

$ cat >&0 <&2
foo                     # I typed this
foo                     # ...and got this
$

As far as the kernel is concerned, cat is reading from the PTY device and writing to it and it's business as usual.

fork and exec

Booting a modern Linux or OSX doesn't feel minimalistic at all, but internally the mechanism for it is surprisingly elegant. The kernel starts off by creating a single process, usually called init and with PID 1, and this process then clones itself a bunch of times using the fork system call. The clones call exec to turn themselves into other programs, which themselves call fork and exec until you've got terminal emulators, shells, and subprocesses.

Let's start by looking at fork:

// fork.c
#include <stdio.h>
#include <unistd.h>
int main() {
  printf("starting up...\n");
  fflush(stdout);
  printf("fork returned %d\n", fork());
  return 0;
}

(I'll explain fflush in a minute...)

$ c99 -o fork fork.c
$ ./fork
starting up...                  # started once...
fork returned 11182             # and returned
fork returned 0                 # ...twice!

fork is defined to clone the calling process and return from each independently. The parent process gets the nonzero PID of the child process, and the child gets the return value of 0. The two processes are linked from the kernel's point of view: the child's "parent process ID" refers to the parent, which means the parent can (and is expected to) ask about its exit status.

An important aspect of fork is that the parent and child don't share any memory; any modifications made inside one won't be visible to the other. This is why we need fflush in the code above: printf() internally writes to a buffer before issuing the write call to the kernel, and if there are buffered contents then we'll see them twice since the buffered contents are copied along with other memory. For example:

// fork-broken.c
#include <stdio.h>
#include <unistd.h>
int main() {
  printf("starting up (we'll get this message twice)...\n");
  printf("fork returned %d\n", fork());
  return 0;
}
$ c99 -o fork-broken fork-broken.c
$ ./fork-broken
starting up (we'll get this message twice)...
fork returned 28164
starting up (we'll get this message twice)...
fork returned 0

Another important detail is that the child process inherits file descriptors from the parent -- though subsequent modifications, just like memory, aren't shared. But this is why both printf calls went to the same terminal.

exec, address spaces, and executables

exec means "turn into another program," which brings up a more fundamental question: "what does it mean to 'be' a program?"

Let's start with the executable file format. Linux uses the ELF format for binaries -- other OSes use files with similar concepts but different implementation details. The basic pieces are (see man elf):

e_ident                 # magic number for ELF files
e_machine...            # some bytes specifying the machine code architecture
e_entry                 # entry point (start address) within virtual memory
...
program_headers...      # a list of program memory mappings
  memory_address        # location within virtual memory
  file_address          # offset within this file
...

There are other fields used for things like debugging symbols, but the fields above are what governs the executable process itself. (For a more concrete example of how this works, check out tinyelf, which reduces compiled C programs down to minimal ELF files.)

So when you call exec, the kernel first resets the virtual memory space for the process, then goes through the program headers and sets up memory mappings into the executable file. Once that's done, it points the processor at e_entry to transfer control to the program.

exec also does a few other things to manage state, like closing file descriptors marked with the FD_CLOEXEC flag and resetting signal handlers. I'll cover the relevant details as we get into the shell logic.

Ok, shell time

With that background the basic structure of a shell should be a little clearer. We can start by writing a "shell" hard-coded to run ls:

// ls-shell.c
#include <stdio.h>
#include <unistd.h>
int main() {
  char *const ls_argv[] = { "/bin/ls", NULL };
  if (!fork()) {
    // we're the child: execute ls
    execv("/bin/ls", ls_argv);
    perror("uh oh, execv() returned\n");
    return 1;
  }

  // we're the parent
  printf("hope that worked\n");
  return 0;
}
$ c99 -o ls-shell ls-shell.c
$ ./ls-shell
hope that worked
README.md
fork
fork-broken
fork-broken.c
fork.c
ls-shell
ls-shell.c

argv support

argv is specified during the exec() call, and those arguments are forwarded by the kernel as char* argv[] to the new program. The exact mechanism for this is specified by the operating system's ABI and handled by libc before it calls into main().

We could execute ls -l by specifying an argument after the program name. The program name is always the first entry in argv because otherwise a program wouldn't know where it existed in the filesystem. Most programs don't rely on you to set argv[0] accurately, though some things like BusyBox use it to figure out which program you intended to run (since they're all linked to the same file).

While we're at it, let's go ahead and handle user input too.

// argv-shell.c
#define _POSIX_C_SOURCE 200809L
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>

int main() {
  char   *words[256];
  char   *line      = NULL;
  size_t  line_size = 0;
  ssize_t n;
  pid_t   child;
  int     child_status;

  while ((n = getline(&line, &line_size, stdin)) > 0) {
    // Erase ending newline by shortening the string by one
    line[n - 1] = '\0';

    // Split the line into argv words on spaces
    words[0] = line;
    for (int i = 1; words[i] = strchr(words[i - 1], ' '); ++i)
      *(words[i]++) = '\0';

    if (child = fork()) {
      // Parent process: wait for the child
      waitpid(child, &child_status, 0);
      fprintf(stderr, "child exited with status %d\n", child_status);
      fflush(stderr);
    } else {
      // Child process: exec or complain
      execv(words[0], words);
      perror("execv() failed");
      return 1;
    }
  }
}

Now we have a functioning, if awful, shell:

$ c99 -o argv-shell argv-shell.c
$ ./argv-shell
/bin/ls
README.md  argv-shell.c  fork-broken    fork.c    ls-shell.c
a.out      fork          fork-broken.c  ls-shell
child exited with status 0
/bin/ls -l
total 76
-rw-r--r-- 1 spencertipping spencertipping 9313 Apr 23 16:10 README.md
-rwxr-xr-x 1 spencertipping spencertipping 9120 Apr 23 16:11 a.out
-rw-r--r-- 1 spencertipping spencertipping  964 Apr 23 16:11 argv-shell.c
-rwxr-xr-x 1 spencertipping spencertipping 8800 Apr 22 11:09 fork
-rwxr-xr-x 1 spencertipping spencertipping 8712 Apr 22 12:02 fork-broken
-rw-r--r-- 1 spencertipping spencertipping  184 Apr 22 12:02 fork-broken.c
-rw-r--r-- 1 spencertipping spencertipping  164 Apr 22 11:35 fork.c
-rwxr-xr-x 1 spencertipping spencertipping 8816 Apr 23 09:21 ls-shell
-rw-r--r-- 1 spencertipping spencertipping  324 Apr 23 09:21 ls-shell.c
child exited with status 0

Awful things at this point include:

  1. The shell doesn't search through $PATH entries, which means we have to type out the full path of the program we want to execute.
  2. Multiple spaces in a row will produce empty argv entries, due to the way we're parsing stuff.
  3. It provides no support for redirection of any kind.

(1) and (2) are beyond the scope of this tutorial -- but you can see the solution to (1) by running strace on /bin/sh:

$ strace /bin/sh -c ls 2>&1 | grep stat
stat("/usr/local/sbin/ls", 0x7ffc8df82ce0) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/ls", 0x7ffc8df82ce0) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/ls", 0x7ffc8df82ce0)    = -1 ENOENT (No such file or directory)
stat("/usr/bin/ls", 0x7ffc8df82ce0)     = -1 ENOENT (No such file or directory)
stat("/sbin/ls", 0x7ffc8df82ce0)        = -1 ENOENT (No such file or directory)
stat("/bin/ls", {st_mode=S_IFREG|0755, st_size=126584, ...}) = 0

Basically, the shell is looking at each $PATH entry and calling stat() to see if those files exist and are executable. When it finds one that is, it backfills the path and calls exec.

Why we wait for child processes

The common answer is that "you'll accumulate zombies and this is bad," but how bad can it really be? By the time a program exits, it can't possibly use much memory. Why do we care?

For short-lived code it really doesn't matter much. Once you exit, any child processes you've created will be adopted by init, which will wait for them to exit and nothing bad happens. The problem is what happens if you keep running.

The PID returned by fork is a promise from the kernel that until you collect the child process, the PID will refer to that process. That is, the kernel isn't at liberty to reuse the PID until you've called wait or waitpid to free it. If the kernel did reuse PIDs without acknowledgement, you'd run into race conditions where you could send a signal "to your child process" -- but that process had exited and been replaced by something else.

So zombie processes aren't a memory issue, they're a PID issue. Most systems don't have that many:

$ cat /proc/sys/kernel/max_pid
32768

Pipelines

Right now we can run /bin/ls and /usr/bin/wc -l, but we can't pipe one into the other. To fix this we'll need to create a pipe and then hook the reading-end up to stdin on the second process, and the writing-end up to stdout of the first.

Here's what this looks like:

// simple-pipe.c
#include <stdio.h>
#include <unistd.h>
int main() {
  int pipe_fds[2];              // (read_end, write_end)
  pipe(pipe_fds);
  char *ls_args[] = { "/bin/ls", NULL };
  char *wc_args[] = { "/usr/bin/wc", "-l", NULL };
  if (fork()) {
    // /bin/ls: replace stdout (fd 1) with the write end of the pipe
    dup2(pipe_fds[1], 1);       // alias pipe_fds[1] to fd 1
    close(pipe_fds[1]);         // remove pipe_fds[1] from fd table
    close(pipe_fds[0]);         // explained below
    execv("/bin/ls", ls_args);
  } else {
    // /bin/wc: do the same thing for fd 0
    dup2(pipe_fds[0], 0);
    close(pipe_fds[0]);
    close(pipe_fds[1]);
    execv("/usr/bin/wc", wc_args);
  }
}
$ c99 -o simple-pipe simple-pipe.c
$ ./simple-pipe
11

Why we close file descriptors (and what this does)

When I was first learning about this stuff I assumed close() must have some effect on the underlying IO device, but this isn't true. This will make sense if you look at it from the kernel's point of view.

  /bin/ls               /usr/bin/wc -l
    fd 0 -----------+     fd 0 -----------+
    fd 1 --------+  |     fd 1 --------+  |
    fd 2 -----+  |  |     fd 2 -----+  |  |
              |  |  |               |  |  |        userspace
============================================================
              |  |  |               |  |  |     kernel space
              |  |  |               |  |  |
              |  /  |               |  |  /
              +-(---+---------------+--+-(-----  pty device (4 references)
                 \                        \
                 |                        |
                 +------------------------+---- pipe device (2 references)

Closing a file descriptor will cause the kernel to remove an entry from a process's FD table, but the underlying device won't know the difference. The device is impacted only when no program refers to it; at that point the kernel will deallocate it. dup2 is the same kind of thing: it creates a new FD table reference within a program, but otherwise doesn't change the IO picture.

TODO

  • Pipelines in a shell (like more than one of them)
  • File redirection example
  • Network redirection example
  • Background jobs, process groups, and weird SIGTTOU stuff
  • Conditionals
  • How to parse this stuff without using strchr?
  • Suggestive commentary about how JIT for a shell would be the best thing ever

More Repositories

1

jit-tutorial

How to write a very simple JIT compiler
C
1,733
star
2

js-in-ten-minutes

JavaScript in Ten (arbitrarily long) Minutes
Perl
268
star
3

bash-lambda

Anonymous functions and FP stuff for bash
Shell
190
star
4

caterwaul

A Javascript-to-Javascript compiler
Perl
176
star
5

nfu

Numeric Fu for the command line
Perl
110
star
6

cd

A better "cd" for bash
Shell
96
star
7

ni

Say "ni" to data of any size
Perl
82
star
8

bashrc-tmux

Smart auto-tmuxing for SSH logins
Shell
58
star
9

jquery.fix.clone

A compilation of fixes for the clone() method in jQuery.
JavaScript
48
star
10

flotsam

Fast floating-point array serialization for Java and JS
JavaScript
35
star
11

canard

A functional concatenative language implemented in Linux/AMD64 machine code and self-modifying perl
Perl
18
star
12

tinyelf

A way to produce really small x86-64 Linux ELF files
Perl
16
star
13

interviewing-in-ten-minutes

A guide to surviving the technical interviewing process
Perl
15
star
14

zerovpn

Automatic OpenVPN using SSH
Shell
14
star
15

js-typeclasses

A typeclass implementation for JavaScript
JavaScript
13
star
16

cheloniidae

Extreme Java Turtle Graphics
Java
13
star
17

delimited-continuations-in-scheme

A simple implementation of shift/reset using call/cc
Scheme
11
star
18

divergence

A functional JavaScript library
JavaScript
11
star
19

cpp-template-lisp

An attempt to write a Lisp in C++ templates
C++
11
star
20

manhattan-model

A 3D model of Manhattan, built from youtube videos
11
star
21

divergence.rebase

Operator overloading and syntactic macros for JavaScript
JavaScript
11
star
22

jquery.gaussian

Gaussian blur plugin for jQuery
JavaScript
10
star
23

jquery.fix.textarea-clone

A fix for blank <textarea> elements after calling clone()
JavaScript
10
star
24

bash-prompt

A bash prompt with custom indicators
Shell
9
star
25

cheloniidae-live

A port of Cheloniidae to JavaScript/Canvas using the Divergence function library
Perl
9
star
26

www

HTML
8
star
27

conky-compiler

Absolute element positioning for conkyrc
Perl
8
star
28

infuse-js

The best Javascript library that could ever possibly exist, ever
JavaScript
8
star
29

fsh

Functional shell scripts
Shell
6
star
30

js-vim-highlighter

A better JavaScript VIM highlighter
Vim Script
6
star
31

bake

Make in bash
Shell
6
star
32

xv

Process-level virtualization for Linux/x86-64
C
6
star
33

perl-objects

Self-modifying Perl objects
Perl
5
star
34

perlquery

A jQuery-like interface to the filesystem
Perl
5
star
35

writing-self-modifying-perl

A step-by-step introduction to self-modifying Perl files
Perl
5
star
36

git-in-ten-minutes

A quick guide to the more confusing parts of Git
Perl
5
star
37

phi

A JIT-compiled functional language in the making
Perl
5
star
38

wumber

CAD for Haskell
Haskell
4
star
39

information-theory-in-ten-minutes

TeX
4
star
40

figment

A programming language with no defined semantics
Perl
4
star
41

plain-blog

A static blog without any Javascript
Perl
4
star
42

divergence.debug

Expression-level debugging for JavaScript
JavaScript
3
star
43

atom-node

An ATOM->JSON converter for node.js
JavaScript
3
star
44

webcrash

My presentations for the Web 3.0 Crash Course
JavaScript
3
star
45

montenegro

A lightweight web framework for node.js and Caterwaul
JavaScript
3
star
46

lock

A mutex for shell commands
Shell
3
star
47

bipolar

Perl
3
star
48

browserpower

A map/reduce server that uses browsers as computing nodes
JavaScript
3
star
49

yagfs

Yet another Git/FUSE filesystem
Ruby
2
star
50

resume

TeX
2
star
51

dotfiles

Emacs Lisp
2
star
52

modus

A UI library for Caterwaul and Montenegro
JavaScript
2
star
53

instaserver

A quick directory server in node
JavaScript
2
star
54

motley

Administration for a motley crew of questionable machines
Shell
2
star
55

data-science-in-ten-minutes

Data science in substantially more than ten minutes
TeX
2
star
56

divergence-guide

Divergence user's guide
JavaScript
2
star
57

caterwaul-terminal

ANSI terminal wrapper for Caterwaul (like ncurses)
JavaScript
2
star
58

docker

A docker SSH/xpra server with stuff I find useful
Dockerfile
2
star
59

futon

Design notes for a futon made from 2x6 spruce
Perl
2
star
60

mulholland

A totally awesome term-rewriting language
Perl
2
star
61

diskbench

A small set of shell scripts to benchmark various disk access patterns
Shell
2
star
62

scala-ctags

A CTags langdef for Scala
2
star
63

metaoptimize-challenge

My solutions to the challenge problem posted at http://metaoptimize.com/blog/2010/11/05/nlp-challenge-find-semantically-related-terms-over-a-large-vocabulary-1m
2
star
64

thermal

A dependency-tracking project management application
JavaScript
2
star
65

jquery.fix.select-clone

A clone() patch to preserve <select> selected values
JavaScript
2
star
66

ocd-scripts

Shell scripts for people with OCD tendencies
Shell
2
star
67

markdown-unlit

Literate compiler for Markdown
Perl
2
star
68

caterwaul-serialization

A serialization library that supports abstract values
JavaScript
2
star
69

note-paper

Graph paper with embedded data markings
PostScript
2
star
70

call-cc-in-ten-minutes

A quick guide to continuations from a Javascript perspective
Perl
2
star
71

mapomatic

Instant Leaflet.js maps
Perl
2
star
72

quickdupe

Fast duplicate-file detector
Perl
2
star
73

node-runabuf

A native extension to execute a node.js Buffer object as machine code
Assembly
2
star
74

rather-insane-serialization

A fairly complete serialization system in Javascript
Perl
2
star
75

perl-in-ten-minutes

A guide to the world's finest programming language
TeX
2
star
76

rho

A Ruby/C/Forth-inspired language
Vim Script
2
star
77

sdoc

Simple documentation for lightweight development
JavaScript
2
star
78

caterwaul-invariant

A library to maintain invariants across state transitions
JavaScript
2
star
79

lazytest

Tests for lazy developers (not that I know of any)
Perl
2
star
80

mathbio2008

A math/biology research project from summer 2008
TeX
1
star
81

mulholland-asm

An x86-64 assembler written in mulholland
JavaScript
1
star
82

caterwaul-reflection

Lexical closure inspection and first-class scope chains for Javascript
Perl
1
star
83

node-talk

A trivial command-line chat client and server
Perl
1
star
84

jquery.instavalidate

A lightweight, general-purpose text field validator for jQuery
JavaScript
1
star
85

bash-hats

Replayable command history
Shell
1
star
86

caterwaul.llasm

A low-level assembler/ELF generator for Caterwaul
JavaScript
1
star
87

peril

The successor of ni
Perl
1
star
88

uml-machine

A self-modifying Perl script to install and manage user-mode linux instances
1
star
89

divergence.vector

Vector geometry Divergence module
JavaScript
1
star
90

caterwaul-futures

A simple but expressive future library for Caterwaul
Perl
1
star
91

caterwaul-c

A really awful C parser/serializer for Caterwaul
Shell
1
star
92

ssh-baby-monitors

So the NSA can't hear stuff your baby says
1
star
93

webshell

A collection of instant-feedback web tools
1
star
94

caterwaul.analysis

Code analysis for Javascript
JavaScript
1
star
95

on

A simple way to run something on another machine
1
star
96

caterwaul.queue.blocking

An asynchronous blocking queue (should work on both client and server)
JavaScript
1
star
97

caterwaul-factory

A Caterwaul library for generating test data
JavaScript
1
star
98

node-router

A simple multiprotocol request router for node.js
JavaScript
1
star
99

bash-variable

Self-modifying files for storing values in bash
Perl
1
star
100

caterwaul-splunge

Realtime graphing for Caterwaul
Perl
1
star