Rethinking the filesystem as global mutable state, the root of all evil

This is a follow on from Rethinking files; read that article first.

When you invoke a function in an imperative programming language, you pass arguments to that function. But that function may also operate on global mutable state, meaning there are other factors in the operation of that function than merely the arguments you pass to it. To some extent this kind of contextual, implicit passing of state is a necessary practicality. But overuse of global mutable state is also a menace, and good programming practice dictates avoiding excessive global variables.

It occurs to me, however, that this is a lesson that operating systems could take to heart. Whenever you invoke an executable, you are implicitly passing that executable a massive amount of global mutable state: namely, the filesystem.

This has its problems. Firstly, it's hard to execute a program without passing it your whole filesystem. You can do it with chroots or containers, but both are remarkably cumbersome and awkward. In fact, the development of “containers” is arguably a product largely of this deficiency of modern operating system design.

Consider the alternative. Suppose that whenever you spawn another process, that process must explicitly be passed some sort of handle to the filesystem, or otherwise (by default) have no access to the filesystem at all. (To be clear, I'm talking at the level of OS APIs here. A shell could certainly be designed to pass the filesystem by default to any program you execute, as a matter of convenience.)

I'm also importing some of the concepts from Rethinking files here, namely the idea that handles can be held by a shell as shell variables, shells can pass handles as command line arguments to programs, and programs can return handles in turn. In other words, they're capabilities.

So here are some examples of how this might work:

$ ls /
bin
usr
etc

We execute ls and it shows us some directories at the root of the filesystem.

$ FS=$(fs-empty) ls /
$

This time when we executed ls it showed us an empty directory. This is because we told our shell to invoke the ls program with a specific filesystem passed to it; namely, the handle to a filesystem returned by the program fs-empty, which spawns a virtual filesystem which contains no files and returns the handle to it. (The shell handles the FS environment variable specially.)

$ touch a.txt
$ touch b.txt
$ tar cf foo.tar a.txt b.txt
$ FS=$(fs-tar foo.tar) ls /
a.txt
b.txt

This time we created a tar file containing files a.txt and b.txt. We then invoked ls, telling the shell to pass to the ls program the handle to a filesystem returned by fs-tar. The hypothetical fs-tar program takes the path to a tar file and yields a filesystem providing read-only access to the files in that tar file. Thus, the invoked ls thinks that the root of the filesystem contains a.txt and b.txt.

$ FS=$(fs-filter /bin /etc) ls /
bin
etc

This time the filesystem is passed from a hypothetical fs-filter program, which takes the filesystem it is called with and proxies that filesystem so that only subsets of the tree are revealed.

$ FS=$(FS=$(fs-tar foo.tar) fs-filter /a.txt) ls /
a.txt

This time, we passed the filesystem from fs-filter, but overrode the filesystem passed to that progam in turn, so that the filesystem fs-filter saw was the filesystem revealed by fs-tar. This shows that filesystems can be composed, purely for the invocation of a specific program.

Of course, as per Rethinking files, the filesystem needn't be restricted to just files. Instead, it can serve as the principal namespace for named resources. Network access, for example, can be provided via this namespace, which means that fs-filter can actually be used to restrict network access. What the above idea effectively creates in many ways is what you really want from containers — without the need for a container runtime. Most importantly of all, execution of “containers” works just like invoking a normal process. There are no daemons, no central repositories of container images.

The underlying idea here is that, somewhat akin to a functional programming environment, the only resources passed to a process that you execute should be those resources that are explicitly passed (or at least, which you configure your shell to automatically pass).

Recall that in Rethinking files, I proposed that resource handles could be passed as command line arguments. The common stdin/stdout/stderr can also be made explicit:

$ FS= ls
error: no filesystem

$ FS= STDIN= STDOUT= STDERR= ls
$

Here we execute ls with no filesystem, no stdin handle, no stdout handle and no stderr handle! Because it doesn't even have a stderr handle, it can't output anything, although it probably tried to output the message complaining that it's been given no filesystem.

Note that this is distinct from the following in a conventional shell:

$ ls </dev/null &>/dev/null

In this case, stdin/stdout/stderr handles are still passed, they're just connected to /dev/null. In the above case however, the process receives no handles at all — not to a filesystem, nor to stdin/stdout/stderr.

Of course, we can take this approach further. By bringing in the idea that when we execute a process, we should get to declare what resources it inherits, we make sandboxing untrusted processes very easy. For example, a shell could use a pseudo-environment variable PRIV to communicate desired privileges in more beginner-friendly terms:

# Execute untrusted binary which you only trust to use stdio (no filesystem access)
$ PRIV=stdio ./sketchy_binary_from_the_internet