Coroutines in C

Suppose that in C, you wish to implement a recursive descent parser with the following requirements:

Not all input to the lexer is available immediately.
As much work as possible should be done as soon as possible.
The parser should not block when more input is not yet available.

In other words, a non-blocking parser which must rely on a non-blocking lexer which receives some input stream (from the network, say.)

A classic “pull” parser is typically structured something like this:

        def parse():
          until EOF:
            token := get_token()
            parsing logic

In this model the state of the parser is partially or wholly encoded into the call stack. get_token must block if input is not yet available. If get_token is non-blocking, and returns an error code when input is not yet available, how should the parser proceed? It cannot return, because to do so would destroy the current state of the parser. Thus this approach appears fundamentally ill-suited to non-blocking semantics.

One approach that suggests itself is to implement some sort of abstract machine which accomplishes the parsing task, and whose execution is moderated by C code. This is approximately the approach used by common parser generators such as bison; appropriate tables are generated, and these tables are processed by an execution rule to accomplish parsing.

Because the state of the parser in this case is entirely explicit (declared variables representing the machine state) instead of partially implicit (the C call stack), the execution rule can be designed to interrupt execution at any time and return control to its caller. A parser can thus be designed to pause its execution when it runs out of tokens, such that it can be resumed at a later time when more tokens are available. The parser caller retains control of the call stack. This approach is offered by bison's “push” API, which is possible in the first place because of bison's table-driven approach.

However, this solution merely avoids the use of a recursive descent parser, and so only begs the question as to how to combine the recursive descent method with non-blocking semantics.

The use of table-based parsers demonstrates the flexibility of defining an abstract machine; one could, for example, define a virtual machine, and implement a parser using recursive descent in a language targeting that machine. The machine would be interpreted and thus all state would be explicit and thus under control of C code. But this is an absurd amount of effort, and all but requires that you already have a functioning parser for some language. This approach is likely to collapse under its own severe weight.

What is really wanted here is a way to convert the implicit state of the C call stack into an explicit state, so that it can be saved and restored arbitrarily. The most direct solution to this problem would be to obtain some way to create multiple C stacks and switch between them; essentially, a coroutine facility.

Simon Tatham shows that Duff's device can be used to implement a limited form of coroutines. But these are not really sufficient for a recursive-descent parser, because this technique only allows the restoration of the point of execution within a single function, rather than an entire chain of function calls and their points of execution within. Thus this technique offers only a subset of the functionality a complete coroutine facility would.

What is desired here is a complete coroutine system for C. Googling “C coroutines” reveals an array of solutions, but the common strategy seems clear, consisting of the development of the following functions:

A function to create a new stack, primed to call some entrypoint function.
A function to restore the CPU registers (the context), including the stack pointers, from some structure.
A function to save the current stack pointers and other CPU registers to a structure.

Surprisingly, it turns out that POSIX provides for such functionality, via the functions makecontext(3), getcontext(2) and setcontext(2). These three functions are sufficient to implement coroutines in C. The swapcontext(3) function is also offered, which conveniently combines the functions of getcontext(2) and setcontext(2).

This is only half the story, of course, since this does not offer a solution for Windows. But it turns out Windows also offers its own coroutine facility, in the form of “Fibers”.

Method	Platforms	Create	Switch
`makecontext(3)`	POSIX	`makecontext(3)`	`swapcontext(3)`
Fibers	Windows	`CreateFiber`	`SwitchToFiber`

The remaining concern is that there might be some form of impedence discontinuity between the two mechanisms. Perhaps most famously, process creation on POSIX systems is built upon fork(2), which duplicates a process; whereas Windows has no facility to duplicate a process and all processes are created separately of one another. Can makecontext(3) and the Fibers API be unified in a reasonable manner?

Using the makecontext(3) API is straightforward. The Fiber API is a little more quirky. First, the current thread must be primed for use with fibers using the confusingly named ConvertThreadToFiber call. (CreateFirstFiber might have been a better name.) The first fiber is created and a handle to it is returned. Additional fibers are then created via CreateFiber, which is similar to makecontext(3), except that you do not allocate the stack yourself. A fiber is scheduled using SwitchToFiber, which is similar to swapcontext(3). Unlike the makecontext(3) functions, you do not need to manage context structures yourself.

Curious as to how Fibers are implemented? See the ReactOS source code for a rough idea. It's pretty much exactly what you would expect — a “fiber” structure is used to contain the CPU context, a stack is allocated in the same way as in CreateThread, and NtCurrentTeb is used to obtain certain context information. SwitchToFiber is implemented using assembly in i386/fiber.S.

Wikipedia lists a variety of coroutine implementations for C and C++. Some use the above methods and automatically abstract the use of makecontext(3)/Fibers on different platforms. Others appear to try and guess the structure of jmp_buf at runtime and rig setjmp/longjmp to perform context switching.

Boost Coroutine and its companion Boost Context have recently been released. The API is rather ugly — I think a C API would have been more elegant for this sort of thing — but the library otherwise looks promising.