Oct 3, 2023·9 min read

xv6 Lab: Utilities

xv6 Lab: Utilities

I have been recently watching lecture videos from MIT's Operating Systems class, and doing the xv6 labs. I took Operating Systems in college, but my professor was not very good at explaining things. He does not use slides, but simply draws some very abstract diagram on the blackboard, without much annotation. I thought taking the class again following the MIT lecture videos would help me understand Operating Systems better, and maybe someday allow me to write my own.

I finished the first lab "Utilities" in a bit over 2 hours. The lab mainly consists of writing 5 short user-space programs for xv6. These are "utility" programs that can be found in UNIX, and are implemented using existing system calls in xv6. Most tasks aren’t challenging and should be very easy for someone with some C experience, but the primes task took me over 1.5 hours.

The Sieve of Eratosthenes

The primes program implements the Sieve of Eratosthenes algorithm using fork and pipe. The main idea is described in this article.

Briefly, the primes program has a root process that sends numbers 2 through 35 to a sieve process. The sieve process can spawn more sieve processes, and all sieve processes do the same thing:

  1. Receive a number pp from its parent.
  2. Print pp as a prime.
  3. Continue receiving numbers from the parent. Spawn a child, and pass any number that is not a multiple of pp to the child.
  4. Exit when the parent is no longer sending numbers (closes the pipe).

In the end, each sieving process will print out exactly one prime, and rule out all multiples of that prime for future processes.

The root process

Things seem straightforward so far. Since the root process is different from the sieve processes, we would need to handle it separately. We can write the following skeleton pretty quickly:

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"

int
main(int argc, char *argv[])
{
  int pp[2];
  pipe(pp);

  if (fork() == 0) {
    // sieveing processes
  } else {
    // root process simply sends 2-35 to the first sieving process
    close(pp[0]);
    for (int i = 2; i <= 35; i++) {
      write(pp[1], &i, sizeof(i));
    }
    close(pp[1]);
  }

  exit(0);
}

The first sieve process

The article describes what each sieve process should do in pseudocode. We can translate it to C code and try to plug it into our skeleton:

int
main(int argc, char *argv[])
{
  int p, n;
  int pp[2];
  pipe(pp);

  if (fork() == 0) {
    // sieving processes
    if (read(pp[0], &p, sizeof(p)) <= 0) {
      close(pp[0]);
      exit(0);
    }
    printf("prime %d\n", p);

    int newpipe[2];
    pipe(newpipe);
    // spawn a child so it can read from newpipe[0]
    while (read(pp[0], &n, sizeof(n)) > 0) {
      if (n % p != 0) {
        write(newpipe[1], &n, sizeof(n));
      }
    }
  } else {
    // root process
    // omitted
  }

  exit(0);
}

At this point, we realize that the first sieving process is working with 2 pipes: a pipe created by its parent (pp) and a pipe created by itself (newpipe). The parent process writes to pp[1]. The sieving process itself reads from pp[0] and writes to newpipe[1]. The sieving process will spawn another child, which will read from newpipe[0].

Let's try to add the fork() call to the above code:

int
main(int argc, char *argv[])
{
  int p, n;
  int pp[2];
  pipe(pp);

  if (fork() == 0) {
    // sieving processes
    if (read(pp[0], &p, sizeof(p)) <= 0) {
      close(pp[0]);
      exit(0);
    }
    printf("prime %d\n", p);

    int newpipe[2];
    pipe(newpipe);
    if (fork() == 0) {
      // next sieving process
      // ???
    } else {
      // current sieving process
      while (read(pp[0], &n, sizeof(n)) > 0) {
        if (n % p != 0) {
          write(newpipe[1], &n, sizeof(n));
        }
      }
    }
  } else {
    // root process
    // omitted
  }

  exit(0);
}

After forking(), our current sieving process will start forwarding n to the next sieving process. But what does the next sieving process do (the ???)?

The next sieving process

Recall that fork() starts a child process in the exact same state as the parent process. Therefore, the next sieving process starts right at the ??? comment in the above code. The newly spawned process wants to read p from newpipe which was created in the parent process. We have already implemented this logic (although we were reading from pp), but we have already gone past it! How do we go back up to reading p again?

The goto statement comes to mind, but we can also use a while loop and the continue statement as a replacement.

int
main(int argc, char *argv[])
{
  int p, n;
  int pp[2];
  pipe(pp);

  if (fork() == 0) {
    // sieving processes
    while (1) {
      if (read(pp[0], &p, sizeof(p)) <= 0) {
        close(pp[0]);
        exit(0);
      }
      printf("prime %d\n", p);

      int newpipe[2];
      pipe(newpipe);
      if (fork() == 0) {
        // next sieving process
        continue;
      } else {
        // current sieving process
        while (read(pp[0], &n, sizeof(n)) > 0) {
          if (n % p != 0) {
            write(newpipe[1], &n, sizeof(n));
          }
        }
        exit(0);
      }
    }
  } else {
    // root process
    // omitted
  }

  exit(0);
}

In the above code, we wrapped the entire logic for sieving processes in a permanent loop. When the second sieving process spawns, it will go back to the beginning of the loop and start anew, just like our first sieving process did. Now, we just need to make it read from the pipe that our first sieving process created. This can be done by simply renaming newpipe to pp!

To make things easier to understand, let's do some renaming. In a process's context, let's call the pipe that it reads from (created by its parent) leftpipe, and the pipe that it writes to (created by itself) rightpipe. We end up with the following code:

int
main(int argc, char *argv[])
{
  // See https://swtch.com/~rsc/thread/
  int p, n;

  int leftpipe[2];
  int rightpipe[2];
  pipe(rightpipe);

  if (fork() == 0) {
    // first sieving process
    while (1) {
      // move rightpipe to leftpipe for reading
      leftpipe[0] = rightpipe[0];
      leftpipe[1] = rightpipe[1];
      rightpipe[0] = 0;
      rightpipe[1] = 0;

      if (read(leftpipe[0], &p, sizeof(p)) <= 0) {
        exit(0);
      }
      printf("prime %d\n", p);

      pipe(rightpipe);
      if (fork() == 0) {
        // child is created as a new sieving process
        continue;
      } else {
        // parent sends sieved numbers to next sieving process
        while (read(leftpipe[0], &n, sizeof(n)) > 0) {
          if (n % p != 0) {
            write(rightpipe[1], &n, sizeof(n));
          }
        }
        exit(0);
      }
    }
  } else {
    // root process omitted
  }

  exit(0);
}

Phew! Everything is looking good now. The root process sends numbers to the first sieving process through its rightpipe.

For the first sieving process, the root process's rightpipe is its leftpipe. Here, we are "moving" the fd from rightpipe to leftpipe, so the two fd slots in rightpipe can be reused! The first sieving process reads p = 2 from leftpipe, prints it, and creates a new rightpipe for its child. Then it passes sieved numbers 3, 5, 7, ... to its child before exiting.

The second process does the exact same thing. Its leftpipe is the first sieving process's rightpipe. It reads p = 3 from leftpipe, prints it, passes 5, 7, 11, 13, ... to its child, and exits.

Looks like things are all working! The last thing we need to do is to make sure that everything is cleaned up properly.

Final code

#include "kernel/types.h"
#include "kernel/stat.h"
#include "user/user.h"

int
main(int argc, char *argv[])
{
  // See https://swtch.com/~rsc/thread/
  int p, n;

  int leftpipe[2];
  int rightpipe[2];
  pipe(rightpipe);

  if (fork() == 0) {
    // first sieving process
    while (1) {
      // move rightpipe to leftpipe for reading
      leftpipe[0] = rightpipe[0];
      leftpipe[1] = rightpipe[1];
      rightpipe[0] = 0;
      rightpipe[1] = 0;

      // we are not going to write to leftpipe again
      close(leftpipe[1]);

      if (read(leftpipe[0], &p, sizeof(p)) <= 0) {
        close(leftpipe[0]);
        exit(0);
      }
      printf("prime %d\n", p);

      pipe(rightpipe);
      if (fork() == 0) {
        // child is created as a new sieving process
        continue;
      } else {
        // parent sends sieve numbers to next sieving process
        // we can close now since the parent won't need to read from rightpipe
        close(rightpipe[0]);
        while (read(leftpipe[0], &n, sizeof(n)) > 0) {
          if (n % p != 0) {
            write(rightpipe[1], &n, sizeof(n));
          }
        }
        close(leftpipe[0]);
        close(rightpipe[1]);
        wait(0);
        exit(0);
      }
    }
  } else {
    // root process simply sends 2-35 to the first sieving process
    close(rightpipe[0]);
    for (int i = 2; i <= 35; i++) {
      write(rightpipe[1], &i, sizeof(i));
    }
    close(rightpipe[1]);
    wait(0);
  }

  exit(0);
}

The final code looks a bit messy, but it works! We can run make qemu and run primes in the shell to make sure it works.

$ make qemu
<omitted>

xv6 kernel is booting

hart 2 starting
hart 1 starting
init: starting sh
$ primes
prime 2
prime 3
prime 5
prime 7
prime 11
prime 13
prime 17
prime 19
prime 23
prime 29
prime 31
$

What I Learned

I learned a lot about pipes file descriptors from this lab. There were a few things that I didn't know before doing this lab.

For example,

  • After a process with an open fd is forked, does the fd in the parent and child point to the same file?
  • If the parent closes the fd, can the child still use its own fd?
  • What's the difference between dup(fd) and simply assigning fd to another variable?
  • When exactly do we want to close the read/write fds for a pipe?

If you are still a bit unsure about the questions above (like I am), I would encourage you to try this lab yourself!