The DevEver Documentation and Specification Language (DEDOC)

Introduction

This document specifies the DevEver Documentation and Specification Language (DEDOC). DEDOC is an XML schema and markup language intended for the specification of technical specifications, standards and documentation. For more information on DEDOC and its supporting tooling, see the DEDOC website.

How DEDOC is specified. Note that this specification is itself written as a Guile Scheme program, which, when executed, outputs XHTML with embedded RELAX NG schema definitions. Thus, this document is both human and machine-readable. Generally, the RELAX NG schema will be automatically extracted from this document to facilitate its further use for validation purposes.

Because the RELAX NG schema is written as part of this document, this document is the canonical source for the RELAX NG schema definitions for DEDOC. Thus, this document constitutes the normative specification for DEDOC for the purposes of both human-readable and machine-readable expressions.

Moreover, the expression of DEDOC herein, in which narrative is interweaved with RELAX NG definitions shown inline, constitutes an application of “literate programming” methodology to XML schema definition. This is directly inspired by other attempts to both apply literate programming to XML schema definition while simultaneously having a single source of truth for schema definition, most notably TEI's “One Document Does It All” (TEI ODD) model.

The choice of XHTML as the schema for this document was made to avoid circular dependencies on DEDOC.

Purposes of DEDOC. DEDOC is intended to support writing documentation once and producing multiple production quality output formats, including:

XHTML (multiple pages);
XHTML (single page);
EPUB;
PDF via ConTeXt XML, for the highest quality typesetting for print output;
PDF via XSL-FO (low-quality output; ConTeXt preferred);
man pages (in mdoc format); and
plain text (RFC-style).

There exists an existing XML-based markup language, namely DocBook, which aims to cover much of this ground, but DocBook suffers from several issues:

it is an expansive and extremely complex taxonomy, with a massive number of features unlikely to be fully supported by any given transformation toolchain;
the available open source tooling for typesetting DocBook into print cannot produce output of as high a quality as TeX can (for production typesetting, commercially available XSL-FO implementations such as Antenna House's appear to be used);
DEDOC aims to have better support for the high-fidelity yet ergonomic expression of figures and diagrams.

When designing a universal source format for documentation which aims to target the production of multiple output forms, the means by which diagrams are to be expressed becomes a potentially complicated question. In particular:

production-quality PDF outputs require diagrams be incorporated in vector (PDF) form for acceptable quality;
modern XHTML outputs require diagrams in SVG form if they are to be resolution-independent;
older web or EPUB devices might require diagrams be provided in raster (e.g. PNG) form.

What complicates this matter further is that diagrams may contain text. Where a specific typesetting system is used such as TeX, it is likely to be jarring if text in a diagram is rendered differently to the main body text, as there can be distinctive differences in rendering. Conversely, if a diagram has its text typeset in TeX, because the diagram was generate via TeX, it may be jarring for such a diagram to appear inside a web page.

Thus, the following determinations are made:

Firstly, that it may be inevitably necessary for different input representations to be used for generation of different output formats. For example, a diagram might be provided as two files, one to be used for XHTML output and one to be used for PDF output.
This is not constrained to just diagrams. For example, a piece of mathematics might need to be expressed both as TeX code (for use when generating PDF output) and as MathML code (for web use). (Though ConTeXt does support MathML input, it is anticipated that there will be cases where its MathML support is inadequate for complex formulas.)
Thus a general solution of forks is adopted. A fork is a construct inside a DEDOC document whereby a processor consuming a DEDOC document chooses exactly one of the forks, and is free to choose the fork most appropriate to it. For example, a math formula might be expressed in a fork containing both MathML and TeX representations, or a diagram might be expressed in a fork containing SVG, PDF and PNG representations.
Secondly, support for diagrams receive specific attention. A diagram is essentially expressed as a fork (though it may be a degenerate fork containing only one representation). Diagrams may be expressed in a variety of formats, such as external SVG, PDF or raster files, or as some kind of program or program fragment which, when executed, generates the desired diagram. An example of the latter is the TikZ DSL for drawing diagrams which has been implemented on top of TeX, but there are also countless other examples of non-TeX programs which are designed to consume some kind of textual input and generate diagrammatic outputs, such as Asymptote. This constitutes a very convenient and time-saving way for developers to express (and version control) diagrams which might otherwise have to be created manually inside a graphical editor and versioned as opaque binary files created by graphical editing tools.
A diagram fork thus specifies one or more representations; each representation specifies its format, and either its text or a filename containing the representation data. If multiple representations are provided, an output generator is free to choose the one best suited to its output format.
In some cases, the desired output format may be especially “aligned” with a provided representation. For example, if a diagram is provided as TikZ code, and the output is being produced using a TeX processor, rather than generating the diagram as a vector file and embedding it, the TikZ code can be directly executed inside the TeX environment during the typesetting process. This has the advantage that the diagram inherits any font and other settings applied to the TeX document and thus matches the look and feel of the rest of the output as closely as possible.
In other cases, the only provided representations may be “unaligned”. If the provided representation is a raster or vector image, it is simply included directly. Another example of an unaligned representation is TikZ input code where the desired output is XHTML; in this case, TeX must be invoked for each such diagram to produce SVG output for that diagram alone. Compare this with when TeX is being used for typesetting, where the TikZ code is simply included into the document and does not result in a separate invocation of TeX. This process should be managed automatically, so that TikZ can be used to generate diagrams for both XHTML and TeX (PDF) output methods.
Numerous other text-input diagram generators are available and it is anticipated that these will generally require the invocation of some external program for both the TeX and XHTML output pathways; thus the diagram support in DEDOC must be extensible to arbitrary external processing tools.
Although diagrams are modelled as forks and thus can have an author write or provide multiple input representations if truly necessary to maintain acceptable fidelity across all output formats, it is anticipated that in the majority of cases it will be possible to generate acceptable quality diagrams from a single source and DEDOC focuses on ensuring that this is the case.

Structural Constructs

doc

An entire document.

doc = element doc { universal-attributes, docctl, docbody }

docctl

The control information comprises metadata which does not appear in the document body itself, and which should not necessarily be rendered.

docctl = element docctl { title & buildinfo? }

buildinfo

Contains information about a build process which produced a DEDOC XML file.

buildinfo = element buildinfo { vcsrevsummary & vcsrev? & vcstime? }

vcsrevsummary

Contains a single-line VCS revision summary. The form attribute indicates whether a full or abbreviated revision summary is used. For example, a short VCS revision summary might contain only a few hexadecimal characters to cryptographically identify the revision, whereas the long summary may contain the full hash. Note that there is no set form for either of these strings and they are not required to contain only hexadecimal characters.

vcsrevsummary = vcsrevsummary.long? & vcsrevsummary.short? vcsrevsummary.long = element vcsrevsummary { attribute form {"long"}, text } vcsrevsummary.short = element vcsrevsummary { attribute form {"short"}, text }

vcsrev

Contains a full-length cryptographic identifier for the VCS revision from which the DEDOC XML file was built. If the VCS being used does not have a suitable cryptographic identifier, the best available unambiguous identifier should be used. A + should be appended if the tree was 'dirty' when building, meaning that changes may have been made since the referenced revision.

vcsrev = element vcsrev { text }

vcstime

Contains a timestamp for the VCS revision from which the DEDOC XML file was built.

vcstime = element vcstime { xsd:dateTime }

docbody

The document body contains structural constructs.

docbody = element docbody { (sec)* }

sec

A section. Sections may begin with some block-level constructs which are not in a section, but block-level constructs directly within a given section may not come after a subsection of that section. Sections nest infinitely, but specific output systems may have limits on the depth supported.

sec = element sec { universal-attributes, attribute man-section {text}?, attribute man-os {text}?, attribute man-volume-title {text}?, attribute secno {text}?, titledHdr, BLOCK*, sec* }

titledHdr

Container of header and metadata information for structural (and formal float) constructs which have a title.

titledHdr = element hdr { secno?, lint, title }

title

The title of a structural construct, such as a section or document.

title = element title { text }

secno

The number of a section.

secno = element secno { text }

Block Constructs

Block constructs contain other block constructs, inline constructs or text, and generally relate to the vertical layout of text in a document.

BLOCK = (p | LIST | FORMAL_FLOAT | ESCAPE | VERBATIM)

Paragraphs

Denotes a paragraph, which constains only inline constructs and which is the most commonly used construct to place inline constructs in a block construct environment.

p = element p { INLINE* }

Lists

LIST = (ul | ol | dict)

ul

An unordered list. May contain only <li> elements, which constitute the elements of the list.

ul = element ul { li* }

ol

An ordered list. May contain only <li> elements, which constitute the elements of the list.

ol = element ol { li* }

li

A list item in an ordered or unordered list.

li = element li { BLOCK* }

dict

A dictionary list, which maps keys to values.

dict = element dict { dice* }

dice

A dictionary list item.

dice = element dice { dick, dicb }

dick

The key of a dictionary list item.

dick = element dick { INLINE* }

dicb

The body of a dictionary list item.

dicb = element dicb { BLOCK* }

Formal Floats

Formal floats are numbered containers such as “figures” and “tables”. These form separate numbering namespaces independent of section numbering. They do not necessarily contain actual tables.

FORMAL_FLOAT = (figure | table | equation)

figure

A figure is a formal float numbered with a prefix word “Figure”. They are generally to be used to show diagrams but need not be. They contain block constructs.

figure = element figure { titledHdr, BLOCK* }

table

A table is a formal float numbered with a prefix word “Table”. They are generally to be used to show diagrams but need not be. They contain block constructs.

table = element table { titledHdr, BLOCK* }

equation

An equation is a formal float. They are used for display math and contain math code directly; they cannot be used for other purposes. This is a forking construct.

equation = element equation { tex?, mmlmath? }

Verbatims

Verbatims are blocks of text which are laid out in monospaced, verbatim form with no elision of spaces. They are typically used for displaying source code fragments. Note that unlike e.g. LaTeX verbatims, they can contain other markup.

VERBATIM = (listing)

listing

A generic code listing verbatim. This should be your default choice of verbatim if in doubt.

listing = element listing { INLINE* }

Diagrams

TODO

Inline Constructs

Inline constructs contain text and other inline constructs, and generally relate to the horizontal formatting of text within a given paragraph.

Semantic Phrases

“Semantic phrases” refers to one or a few words which should be annotated with their semantic meaning so that they can sometimes be specially typeset. Examples of “semantic phrases” that appear in many manuals are typed commands, class names, RFC 2119 keywords, etc.

SEMANTIC_PHRASE = (proword | procn | kw)

proword

A proword is a word or phrase with normative power in the context of a standard or specification. Examples include RFC 2119 capitalized words in RFCs, and the phrases “shall”, “shall not”, “should”, “should not”, “may”, “may not”, “must” and “must not” in ISO standards.

proword = element proword { INLINE* }

procn

A procedure name. Used to refer to a procedure by name in prose.

procn = element procn { INLINE* }

kw

A keyword. Usually typeset in monospace.

kw = element kw { INLINE* }

Mathematics

MATH_INLINE = (math)

math

Inline mathematics. This is also a fork construct, and can therefore contain multiple representations of the same mathematics.

math = element math { tex?, mmlmath? } mmlmath = element mml:math { attribute * {text}, (mmlany | text)* } mmlany = element mml:* { attribute * {text}, (element mml:* { mmlany* } | text)* }

Breakouts

A breakout is a construct which is considered a block, and which can contain blocks, yet which is allowed to appear in an inline context.

BREAKOUT = (footnote)

footnote

A footnote defined inline. A footnote contains block constructs.

footnote = element footnote { attribute label {text}?, BLOCK* }

Cross-Referencing

Inline constructs which reference other documents, or other constructs in the same document. Some of these are also considered semantic phrases.

REFERENCE = (term | link | cite)

term

Use a term in prose which was previously defined. Use to properly reference the item of terminology at its definition site.

The optional attribute “sp” specifies whether this use of the term is singular or plural.

term = element term { attribute xlink:href {xsd:anyURI}, attribute sp {("singular" | "plural")}?, INLINE* }

link

Inline reference to another construct in the same or another document, generating a hyperlink where possible. The text is manually specified.

link = element link { attribute xlink:href {xsd:anyURI}, INLINE* }

cite

Inline citation. This differs from link in that the text of the hyperlink is generated automatically.

cite = element cite { attribute xlink:href {xsd:anyURI} }

Nonsemantic Formatting

Though discouraged, some elements are defined which can be used to express a specific typesetting request. This should only be done if no alternatives are suitable.

NONSEMANTIC_INLINE = (em | tt)

em

Request emphasis (generally represented as italics). Avoid where possible.

em = element em { INLINE* }

tt

Request typesetting in monospace. Avoid using this if an appropriate semantic phrase element is available.

tt = element tt { INLINE* }