Binary formats and protocols: LTV is better than TLV

One of the most ubiquitous patterns found in the design of binary file formats and network protocols is the Type-Length-Value (TLV) construction:

     t bytes    Type
     l bytes    Length
Length bytes    Value

This is the most common order of these fields as it is directly implied by the TLV acronym. However, I don't think it is the optimal construction.

In my view, LTV — Length-Type-Value — is the better construction. One reason for this is that if you arrange the fields like this, you can define the LTV construction as a composition of two different constructions:

  • LV(data) (length-value), in which a sequence of bytes has its length indicated, but without any indication of its format or meaning, defined as:

    LV(data) = len(data) || data

  • TV(type, data) (type-value), in which a sequence of bytes has its type indicated but it is assumed its length is understood from the context, defined as:

    TV(type, data) = type || data

Both of these are useful in different circumstances. If the type of some piece of data is contextually understood, LV suffices. If the length of some piece of data is contextually understood, TV suffices.

By using these constructions, LTV is simply defined as the composition of the two:

  • LTV(type, data) = LV(TV(type, data))

On this basis I believe that LTV is a more “natural” construction than TLV. TLV is a strange construction because it has the type field outside of the length-demarcated area, but everything else inside of it. The type field is given special treatment but it is unclear to me that this is justified. Even with the LTV construction, it's no harder to read the first few bytes to skip over LTVs of a type which is not currently sought.

Disadvantages of LTVs? Are there any disadvantages of the LTV construction? With LTVs, the length field naturally includes the length of any type field. This is fine, but it does create the issue that a malicious or buggy peer could send an LV which is not large enough to include a TV type field. So you would have to check the length is adequate before extracting a type field from a LTV. However if skipping over TLVs/LTVs to search for a specific type, you need to process the length field anyway to get to the next TLV/LTV. Moreover, bounds checks with regard to the buffer length are essential when processing input anyway, so this is nothing new. Plus, this enforcement can be done in wire protocol deserializer routines rather than in higher-level libraries.

There is an argument to be made that it is better to have invalid values “unrepresentable” in a language (see also LANGSEC). With LTV, supposing for example a four-byte type field, the valid length values are [4,], whereas with TLV, the valid length values are [0,]. So TLV seems to have an advantage here in making invalid values unrepresentable in the binary format. But the gain here is limited as you still need to validate lengths of untrusted input with regard to the buffer length (which seems a much bigger hazard than any robustness LANGSEC might offer).

Anyway, I'm not making any big dogmatic point here. It's just long seemed to me that LTV is a more “natural” arrangement than TLV and I'm mildly surprised it's not more common as a construction. So I just thought I'd put that thought here.