The Normativity Manifesto

I previously wrote about Normativity in Configuration Management. I think the concept of normativity as I define it therein sufficiently important to bang its drum a little. Thus, I here attempt to distill the concept into a succinct form.

The concepts I articulate here aren't really new. You already know what I'm talking about if you're familiar with any of the following:

  • the aversion of Linux distros to the bundling of library source trees;
  • the idea that binaries shouldn't be placed into source control;
  • the sometimes-held preference that generated source files shouldn't be placed into source control;
  • having to run autoconf/automake when cloning a repository to generate configure;
  • running make or particularly make clean.

Essentially the idea is this: Any data which is derived from other data is non-normative. Data which is not derived from any other data is normative.

There is generally an expectation that the process of derivation will be automatable, though in some cases it might necessarily be a manual process. For example, it could be argued that source code implementing a business policy is non-normative because it is directly derived from a natural language document setting out the policy. In this case it is the document that is truly authoritative, and if the document is updated the source code implementing it will need to be updated. However, since derivation of the source code from the document is not automatable, the source code must be considered quasi-normative for the purposes of build and operations automation.

The Normativity Manifesto

  • Only normative data should be valued.

  • If there are consequences to the destruction of non-normative data, or you have developed a psychological aversion to such due to the likelihood of inconveniences or risk, something is surely wrong with your automation for the derivation of non-normative data from normative data.

  • Keeping non-normative data around is a liability, because it can diverge from normative data, or one can neglect to update it when its normative origin is changed.

  • A source code repository should store only normative data. This means that, at the very least, the repository should contain no binaries, generated files or autotools configure scripts.

  • Any non-normative datasets which are required should be derivable using wholly automated processes.

  • Automated processes should be available to destroy non-normative datasets and, when required, re-derive them.

  • Non-normative data must never be modified. There must be a unidirectional flow from normative data to non-normative data to progressively less normative data. If changes to non-normative data, these must be effected by changes at the root of the graph, changes to the normative data from which the non-normative data is derived.

  • Non-normative data the derivation of which cannot be automated may be treated as (quasi-)normative data for the purposes of this manifesto. Where possible, alternatives should be investigated to minimise this scenario.

  • In the field of deployment and operations, aim to keep a single “root normative” repository which contains only normative data, and as much of it as possible. This repository dictates the state of the entire system, for some definition of system.

See also

This concept of normativity is serendipitous with the concept of Kolmogorov complexity. In some senses, it can be viewed as a particular instance of what I might term the “Kolmogorov problem”, which is of course the fundamental task of data compression: given some piece of data, generate a computer program that, when executed, produces that data as output, such that the size of that program is minimised. (Formats such as DEFLATE can be considered computer programs targeting a simple, non-Turing-complete virtual machine.)