On numbering hegemonies and namespace monopolies

“Numbering hegemonies” is a term I've come up with to describe a certain kind of artificial monopoly which is common in the IT industry, though by no means necessarily constrained to it.

Broadly speaking, a numbering hegemony is an entity which exclusively controls the allocation of numbers in a namespace, and exploits the resulting monopoly to charge a pretty penny for allocations.

Of course, it's not necessarily inherently illegitimate for a namespace authority to charge some kind of fee to cover basic costs and discourage wasteful use. What sets numbering hegemonies apart is:

  • the seemingly artificial or deliberate scarcity created by a needlessly small namespace chosen by the authority; and/or

  • exorbitant fees in which the monopoly status of the numbering authority is exploited.

There are many examples of what appear to be numbering hegemonies:

  • USB Vendor IDs. This is a 16-bit namespace. In order to make a USB device you need to get a USB vendor ID.

    USB vendors have to be assigned by USB-IF, the steward organisation for the USB standard. For a vendor ID, you have two choices:

    • join USB-IF and pay $5000 a year in annual membership fees, or
    • pay $6000 one-time for a vendor ID without joining USB-IF.

    These fees are clearly preposterous and have resulted in much ire from the maker community. There have been some initiatives launched to provide community access to certain existing vendor IDs, essentially whereby the holder of a vendor ID has offered to subdelegate product ID allocations to specific open source hardware projects.

    The questionable motives of the USB-IF have been demonstrated by their response to such initiatives — namely, by demanding that such practices be stopped and insisting that VID holders may not subdelegate their allocations to third parties.

    What is particularly egregious about USB's 16-bit vendor ID space is that there was absolutely no need for it. Even considering the fact that USB was originally designed to be implementable on 8-bit 8051-based microcontrollers, a larger vendor ID namespace (64-bit or perhaps a 128-bit UUID) would not actually have posed any issue when considering the overall size of USB descriptors. Thus, the scarcity of USB vendor IDs appears entirely artificial. Given the high fees charged by USB-IF, it is hard to shake the impression that the small namespace was chosen specifically to create a pretext for these fees.

    This numbering hegemony has been somewhat “solved”, or at least alleviated, for the open source hardware community, by the creation of a site for the allocation of USB product IDs under the USB vendor ID of a company that has ceased trading. Though the site claims the vendor ID it uses was legitimately transferred to it by that company, and that the company obtained the vendor ID from USB-IF before their contract terms were changed to to prohibit the transfer of vendor IDs, it is hard to see what USB-IF could do to stop them even if this were not the case. (While the number of kinds of intellectual property seems to be gradually expanding—with such obscure examples as database right and semiconductor mask right, which most people haven't heard of—to my knowledge no jurisdiction has yet invented a namespace right.) One blog amusingly described the site as playing “a legal game of chicken” with USB-IF. As the site explains:

    It is our belief that USB-IF has no legitimate right to prohibit this activity, and that their actions are limited to ‘revoking’ the original VID, a fairly meaningless pronouncement since they can never reassign it to anyone else.

  • PCI Vendor IDs, which are also 16-bit values. In fact, USB Vendor IDs seem to have been directly inspired by PCI Vendor IDs.

    PCI-SIG's vendor ID allocation policy is even more inflexible: you have to join PCI-SIG as a member, which requires a $4000 annual fee. (It is interesting that this fee is lower than that charged by USB-IF, especially when considering that the barrier to entry to creating a new USB device is a lot lower than that for creating a new PCI device.)

  • ISBN numbers. ISBN is interesting because it is a case where competition in numbering allocation would have been entirely feasible yet deliberate steps were taken to prevent this(!).

    In short, ISBN numbers are allocated to a country-specific ISBN authority, and then from that authority to publishers. In other words, each country has an organisation you have to buy ISBN numbers from. These organisations are generally private organisations.

    What is particularly egregious is that you are forbidden from buying ISBNs from anything other than your own country's ISBN authority. In other words, competition between different country's ISBN authorities would have naturally occurred if not for ISBN specifically taking action to ensure that this not be the case. The entirely deliberate design of the system is to ensure that wherever you are in the world, there is (with rare exceptions) only one organisation you can deal with to get an ISBN number.

    What is particularly egregious about this is the very substantial difference in the fee schedules operated by the national ISBN authorities in different countries. How much you have to pay for an ISBN number may vary substantially depending on the country in which you operate.

    Plus, many ISBN authorities seem to offer some quite steep bulk discounts, with people ordering ISBNs in the hundreds paying far less per ISBN than people ordering smaller numbers of ISBNs. It is hard to see the rationale for this kind of bulk pricing for what is ultimately just a database entry; yet this disproportionately favours large publishers and punishes self-publishing.

    Some examples: If you are in the US, you are forced to get your ISBNs from a private company called Bowker, who charge like this:

    Block SizeCost/ISBNCost
    1 USD 125 USD 125
    10 USD 29.50USD 295
    100 USD 5.75USD 575
    1000 USD 1.50USD 1500
    US ISBN fees

    If you are in the UK, you have to get your ISBNs from Nielsen, who have a similar, though slightly cheaper schedule:

    Block SizeCost/ISBNCostCost/ISBN (USD)
    1 GBP 91 GBP 91USD 112.84
    10 GBP 16.90GBP 169USD 20.96
    100 GBP 3.79GBP 379USD 4.70
    1000 GBP 0.98GBP 979USD 1.21
    UK ISBN fees

    From this, we can see that UK ISBNs are about 81% of the price of US ISBNs.

    Australia's ISBNs are even cheaper, unless you want 1000 ISBNs, in which case they are surprisingly more expensive:

    Block SizeCost/ISBNCostCost/ISBN (USD)
    1 AUD 44 AUD 44USD 29.48
    10 AUD 8.80AUD 88USD 5.90
    100 AUD 4.80AUD 480USD 3.22
    1000 AUD 3.04AUD 3035USD 2.03
    UK ISBN fees

    From this, we can see that AU ISBNs are about 59% of the price of US ISBNs; but for a block of 10 ISBNs, the price is about 20% of US ISBNs(!).

    All of the above is sidelined however, by the fact that in France 1, Canada, India, Finland, New Zealand, and probably many other countries I haven't investigated, ISBN allocation is completely free, often being handled by national libraries. Thus whether you have to pay for ISBNs at all is subject to the rather arbitrary factor of your country of residence. 2

    Even more weird is the fact that some countries, such as the UK, have ISBN issuance managed by a private organisation for a fee, yet at the same time issue ISSNs (for serials, magazines, etc.) for free. (Presumably this creates a perverse incentive to dubiously recategorise a series of books as a “serial”.)

  • GTINs. The GTIN, ultimately managed by the GS1 organisation, is the namespace for the EAN/UPC barcodes found on all retail products; with the introduction of ISBN-13, ISBNs are actually a subset of the GS1 namespace nowadays, managed by a different international organisation.

    A similar process of delegation to national monopolies is used. Amazingly the terms here for the UK authority are even worse than that of ISBN. You cannot simply buy a GTIN; no, you must join as a member and pay an annual membership fee. Not only that, they have the nerve to base their membership fees on your annual turnover, so you get to pay more for GTINs if you happen to be making a lot of money. By comparison, the US GTIN monopoly sells individual GTINs for a one-time fee.

Non-examples. Despite the above, I will say that I don't believe that all examples of using a narrow numeric namespace to be dubious. While there is not really any valid rationale for using a mere 16-bit namespace for USB vendor IDs, things are different when allocating identifiers which, for example, are constantly transmitted over a network. So for example, IEEE OUIs aren't immediately obvious as a numbering hegemony to me for several reasons:

  • MAC addresses are transmitted across the network in every Ethernet frame, thus there is a need to keep them small — it would be inappropriate to spend more bits in order to avoid the need for centralised allocation (such as by using a UUID);

  • MAC addresses have a universal/local bit which means that half of the MAC address namespace is actually reserved for local, administrator-defined use. This is not necessarily globally useful but it doesn't appear that the intention behind the design of MAC addresses was to restrict free usage.

In the same way, IPv4 and IPv6 addresses clearly don't inherently constitute numbering hegemonies since the alternatives are impractical for the same reason. (Use of string identifiers is a readily available way to avoid a need for a numbering hegemony in a technical standard — but imagine if IP headers stored addresses as a variable-length string!) But on the other hand, you're required to request IP addresses from the RIR applicable for your region, which is to say that, ISBN-style, inter-region competition between RIRs is again deliberately precluded, so access is not as open as it could be. There are definitely aspects that can be criticised.

This seems to draw to a basic principle:

  • A namespace should have enough space to support decentralised allocation (e.g. UUIDs), or allocation via piggybacking on an existing namespace with very low barriers to entry (e.g. strings containing domain names, or URIs).

  • But the exception to this is where there is a compelling technical reason to minimise the size of an identifier.

    A common such case is where the identifier is in the header of a network protocol and thus repeatedly sent across the network, such that the size of the identifer influences the efficiency of the network.

Ways to avoid numbering hegemonies

There are a couple of ways to avoid numbering hegemonies, in roughly descending order of preference:

  • Use strings. This is a good solution in most cases.3

    Strings have the advantage that you can piggyback on existing string namespaces which have very low barriers to entry, such as domain names,4 while still ensuring uniqueness. Reversed domain names as used by Java, such as com.example.foo etc., are also a good example of this. URIs can also be used, which are even more flexible as they can be used with domain names, but also with a variety of other namespaces via URNs, including UUIDs (urn:uuid:...). Thus, string-based schemes not only afford access to a low-cost namespace but can even support use of an unbounded plurality of namespaces. A user can choose the sub-namespace the terms of access to which they find most favourable.5

    The need for guaranteed string uniqueness is often overstated. In circumstances where a suitable guaranteed unique namespace is not readily available, the steward of a technical standard may either run their own registry of registered strings (IANA's protocol registry provides a limitless number of examples of these), or leave things as a free-for-all and hope collisions don't occur.

    In practice, collisions seem to happen surprisingly rarely in the latter case. A good example here is HTTP headers, RFC822 headers and similar protocols. Paranoia about header names chosen in an ad-hoc manner by users without IETF consent and their possible collision with standardised header names lead to the idea of the X- prefix. However, the IETF now considers the X- prefix to be a bad idea. The problem it was intended to solve does not appear to commonly occur, and if an X- header becomes popular enough to be retroactively standardised, it often has to be standardised with the X- still incorporated anyway, due to the need for backwards compatibility with existing implementations.

    As such, a “chaotic good” approach to a non-hierarchical string namespace in which people are simply encouraged to choose something sensible using their own judgement seems to actually work well most of the time.

    Where a formal registry is established for a string namespace, it is hard to operate a numbering hegemony over it to extract exorbitant fees. The reason for this is that if the terms imposed by the registry seem exorbitant, a user may simply ignore the registry and make up their own string, on the basis that it will probably be OK; whereas if one were to make up a “pirate” 16-bit USB vendor ID, this is highly likely to collide at some point. The .onion psuedo-TLD used by Tor is a good example of this, having obviously been straightforwardly appropriated without any permission from ICANN, the grant of such permission having been inconceivable. The use of strings for identifiers fundamentally encourages ad-hoc and permissionless evolution and usage of a standard.

  • Use a numeric space large enough to support safe distributed allocation (i.e., 128-bit UUIDs).

    This is the approach used by UEFI, arguably overused even. UUIDs can be a bit unwieldy, however, and are less nice to work with than meaningful strings.

  • Use a hierarchically delegated numeric space.

    I list this for completeness but it should be relatively rare for this to be needed. In most cases, strings or UUIDs will suffice. In this model, a (for example) 64-bit space is hierarchically subdelegated to various authorities.

    This allows a smaller size of identifier than UUIDs or strings but can still be quite accessible if well designed. Since this is basically the model chosen by ISBNs, it can also be quite monopolistic; it depends entirely on how the namespace is designed and what subauthorities are delegated.

    For example, it is quite easy to get an IANA enterprise number. There are likely other kinds of globally unique (but low bit count) integer identifiers which may be more or less accessible by different kinds of user corporate or natural. So a 64-bit namespace could allocate one prefix to IANA enterprise numbers, another to IEEE OUIs, and so on and so forth. Thus access to the namespace is guaranteed to anyone so long as they can access at least one of the subdelegated namespaces.

    This design is relatively elaborate and strings or UUIDs are surely a preferred design in most cases. But if you need 32-bit or 64-bit identifiers and don't want to create dependence on a central allocation authority, this may be an option.

Notes on OIDs. As an aside, ASN.1 OIDs (e.g., aka iso.identified-organization.dod.internet) are a peculiar case, being an arbitrarily long list of integers, each of unbounded magnitude. As a technology these don't seem too compelling today, being consigned to standards originating from ISO OSI, like X.509 or LDAP. The only seeming advantage over a string based on a hierarchical namespace is a slight increase in conciseness, but these savings probably aren't worth the obtuseness of OIDs in most cases. Processing them is more complex than it may seem when one realises that each component is an integer of unbounded magnitude, meaning that OIDs must be processed with a bignum library.

The namespace, having been created by ISO, mostly focuses on delegation via international and national standards authorities, posing potential access issues; however its hierarchical nature in practice enables a fan out of delegations which are sufficiently broad in practice to alleviate these issues. For example, the OID prefix (aka iso.identified-organization.dod.internet.private.enterprise) is subdelegated by convention to IANA enterprise numbers, meaning that anyone with an IANA enterprise number (which is readily obtained) can get an OID prefix. Another option for access to the OID namespace is the 2.25 hierarchy (aka joint-iso-itu-t.uuid), which is subdelegated via UUIDs. The children of 2.25 are defined as 128-bit integers representing UUIDs; this ensures anyone can get an OID prefix by generating a UUID, converting it to a 128-bit number and prepending 2.25.. Thus, there are in principle no access issues with the OID space. However, it seems many OID processing libraries incorrectly assume an upper bound on integer magnitude, and thus cannot process these OIDs. Go's ASN.1 package in the standard library is such an example, being limited to 31-bit integer components. This may force some OID users to forego use of the 2.25 hierarchy.

1. France does charge 30 EUR initially per publisher, but appears to issue an unlimited number of ISBNs to a given publisher for free after this.

2. This model of an international organisation which delegates to national organisations is not found only in numbering systems and appears common in other fields, such as the field of “international” standards, for example ISO/IEC, yet as a model it feels completely antiquated and obsolete in the age of the internet. ISO seems quite proud of its authority to stick the words “international standard” on the front of its standards documents — presumably with the insinuation that standards produced by other organisations are not worthy of the title. By this logic the Internet and its associated protocols are, presumably, not “international standards”. The ISO's standards development process involves standards being voted on by delegates from each nation-state's standards body — famously how Microsoft's OOXML standard got passed. Why nation-states are the relevant unit of division for organising the agreement of IT standards is beyond me, as this isn't remotely representative of the actual stakeholders and their divisions. All this process does is obstruct people with an actual interest in the standards process from participating. One has to wonder what is supposed to happen if a nation-state votes against adopting a new version of the ISO C++ standard, for example. This can be compared with modern international standards organisations like the IETF, which are open to all with no regard of someone's country of origin, and whose standards, lacking the force of an international treaty, are adopted by choice out of merit and not government mandate.

3. I have sometimes wondered about bypassing the USB vendor ID monopoly by constructing a new ad-hoc “standard” for putting unique product identifiers in one of the standard strings reported by all USB devices (such as Manufacturer Name, etc.), thus essentially replacing the Vendor ID with a string-based device identification system. The problem with this is that it would need to be supported by OSes so that they can automatically source and load the correct driver for a device. While this might be possible on platforms like Linux using appropriate udev rules, I suspect that Windows doesn't support string matching in its driver INF files and can only match on VID/PID.

4. While domain names under a TLD need to be purchased, these domain names can be used to offer arbitrarily many subdomains for free. Thus even if someone cannot pay for a domain name, they can obtain a subdomain of a friend's domain, or from a public service which offers subdomains for free, such as eu.org.

5. Some standards, such as iSCSI, use strings containing domain names to guarantee uniqueness, but take things a bit far, by putting the year and month in which the domain was registered in the string also, out of paranoia that a domain might expire and then be registered by someone else, who then makes another completely unrelated usage of exactly the same string:


This really seems to be a bridge too far. This level of paranoia is really not necessary.