This article is an unpublished draft.

Language Research

Source-to-source translation. Free extensibility of language. Non-textual normative forms for source code. Semantic patching. Macro languages. Preprocessors. Representation of ASTs. “Lossless” parsing and AST serialization cycles. Parsing of complicated languages (e.g. C++) for transformation purposes. AST transformations and lossless dumping.

C Skins

A “C-skin” language is a language which

a. translates to C, and b. at least for basic constructs, translates to C code which someone might actually write and be willing to maintain (“natural” C).

Using S-Expressions: Represent “C” code in S-expression form for ease of lossless parsing and transformation. Take advantage of Lisp's powerful macro facilities. S-expression form code is then translated to C after all transformations are complete. This mapping is often trivial.

There are two ways to implement this:

  • Type SXX: “C” code in S-expression form is interpreted by a Lisp interpreter in an environment with appropriate symbols. Execution of the program constructs an AST on which further transformations can be performed, or causes C code to be output.

  • Type SXP: “C” code in S-expression form is parsed and analyzed by a program, but the program is not executed as a Lisp program. The parser may still be implemented in Lisp to take advantage of Lisp's affinity for S-expressions and macro processing.

Using a Syntax-Tree-Based C-Like Language: A C-like language is parsed to a full syntax tree and then dumped as ordinary C.

Using a Syntax-Tree-Based C-Like Language and Semantic Macros: A C-like language is parsed to a full syntax tree and transformed via macros described in the language alongside the code transformed. Finally ordinary C is dumped.

Using Lexical Preprocessing: Lexical preprocessing is usually less powerful than syntactic techniques. Examples include (of course) cpp, but also gpp, m4, etc. See Alternate Preprocessors.

Using Compiler-Supported Extensions to C: In some rare cases, compilers themselves may support a particular extension.

Existing Research on Representing C in S-Expressions

Preprocessor-based C Skins

  • COS (C Object System), an implementation of CLOS in the C99 preprocessor with minimal support provided by additional utilities
  • Countless numbers of projects needing to implement their own object systems on top of C or C++. GObject, MFC, Qt, etc.

Compiler-Supported Extensions to C

  • The Plan 9 extensions (GCC: -fplan9-extensions). Supports anonymous fields in structs, including implicit casts:

    struct A {
      int x,y;
    };
    struct B {
      struct A;
      int z;
    };
    int f(struct B *b) {
      return b->x;
    }
  • Microsoft: C++/CLI, WinRT, extensions for COM, etc.

  • HTML Generation in Lisp: LML
  • HTML Generation in Lisp: htmlgen
  • HTML Generation in Haskell: blaze-html

Plugin Facilities for C/C++ Compilers

  • GCC: Plugins. Page includes interesting list of plugins which have been developed, including...

  • clang: Plugin mechanism.

    • clang lacks the various ideological firewalling of GCC's plugin mechanism. (clang is MIT licenced.)
    • TODO: investigate.
  • The C++11 [[...]] attribute syntax makes AST annotation easy. This facility is exposed well in GCC's plugin mechanism.

Semieffective C/C++ Parsing Code Generation Tools

“Semieffective” here means that these parsers almost certainly can't parse arbitrary C++, being that parsing C++ is an unreasonably gargantuan challenge. These tools are likely to be able to parse C++ to the minimum level neccessary to achieve some end.

  • Generation of reflection metadata
    • Qt's moc
    • COS (above) relies on some scanning tools.
  • Unreal Engine has some sort of mcpp-based C++ parsing code generation tool.

Miscellaneous

  • PostgreSQL has its “Embedded SQL in C” method. Statements in the form EXEC SQL ...; can be used like C statements.

See also:

TODO: http://zl-lang.org/