The evolution of the web, and a eulogy for XHTML2

In 2009, the W3C cancelled the development of the proposed successor of XHTML, XHTML2.

As far as I can tell, I was approximately the only person saddened by this. However, its cancellation was little more than a foregone conclusion after the events of the years preceding that decision. A kind of coup had occurred in which the standards-setting authority of the W3C was effectively usurped by a new upstart standards-setter, WHATWG, with their own alternative proposal for the future of HTML, HTML5. WHATWG and its proposal gained traction and eventually the W3C was forced to effectively adopt HTML5. All of the industry traction fell in behind HTML5 and the death of XHTML2 became inevitable.

The WHATWG “HTML5” proposal appeared to be motivated by two main factors:

a rejection of XHTML, with only a grudging “XHTML5” XML profile of HTML5; and
a vision for the web as an applications platform rather than the semantic hypertext platform envisaged by the W3C.

The latter point is well demonstrated by a comparison between XHTML2 and HTML5. Whereas the W3C clearly hoped to procure semantic purity with their XHTML2 proposal, HTML5 was all about adding features. The term “HTML5” quickly came to be used to refer not just to HTML5 itself, but countless JavaScript APIs and other augmentations of browser functionality that were being added to browsers in this period which made the web more viable as an applications platform. Arguably, HTML5 appeared as an exciting explosion of possibility for web application developers coming after years of stasis, lack of meaningful technological advancement and a dry obsession with the semantic web from the W3C.

By comparison, examples of various proposals in XHTML2 included removing the <img/> tag in favour of use of the <object> tag¹, and replacing <br/> with a <line>...</line> tag — a pursuit of the perfect schema, but hardly exciting. It's therefore not too hard to understand why HTML5 happened, and it was probably inevitable; the web may have been conceived as a semantic hypertext document platform, but it had quickly turned into a viable platform for applications, and there was only increasing motivation by innumerable parties to dedicate standards-setting manpower to these ends, and not the academic contemplation of excellence in schema design (which only excites people like me).

WHATWG's evolution of (X)HTML dissatisfied me. Most people hailed the introduction of the <audio> and <video> tags as positive due to the fact that browser support for these did not require codec plugins of questionable quality and reliability, but was instead built-in to the browser, failing to realise that HTML standards already had an <object> tag for the embedding of arbitrary MIME types, and that these standards imposed no particular requirement on implementations being plugin-based. If browser vendors wished to build-in their own codecs, they didn't need a new tag to do it. The addition of these tags created new, redundant ways to embed some MIME types which duplicated the functionality of <object>. At the time, it felt to me that the HTML5 authors either didn't understand the very standard they were extending, or at the very least lacked the interest in clean schema design that the W3C possessed.

Another difference between the W3C and WHATWG was their approach to dictating standards. Whereas the W3C drafted standards in the conventional way, WHATWG took a different view: divergence between browsers and these standards was real and hoping that browsers would one day live up to them was an excercise in futility. WHATWG's specifications sought to define not what should be, but what actually is; effectively becoming written documentation for the behaviour of existing web browsers.

Moreover, rather than articulating particular requirements and principles but not how they need be met, the WHATWG specifications tend to be written in a highly algorithmic and prescriptive style; they read like a web browser's source, if web browsers were written in natural language.

WHATWG's primacy as a standards-setter turned the W3C into a peculiar sort of standards patsy; nominally setting their standards but in practice not at liberty to diverge from the output of WHATWG. This year, the W3C and WHATWG agreed to collaborate on creating a single version of HTML and DOM, making the W3C essentially a rubberstamp for WHATWG standards.

Though I agreed with the vision of the web as a federated semantic hypertext platform, this is not to say that I wholly disagree with the new web which is designed first and foremost for applications, which had many logical motivations, allowed Flash to be rendered obsolete, and which has enabled countless applications to be developed which it simply could not have been realised as native applications. Nor has it been all good; the rise of lazily thrown together Electron apps as “native” apps is a plague, and the focus on the web as an applications platform seems to have led to the neglect and decay of the original hypertext use case. This is unsurprising given the total willingness and business incentive of major players such as Google to drive the applications use case and their comparative disinterest² in the semantic hypertext use case, combined with the rise to primacy of a new standards-setting organisation dedicated primarily to this use case.

¶. The pursuit of the semantic web has changed in the era of HTML5, which represented a rejection of XHTML — to me, a seemingly bizarre rejection of having to write well-formed XML as somehow being unreasonably burdensome. Though an XHTML profile of HTML5 was specified (“XHTML5”), the clear emphasis was for the use of traditional SGML-style HTML, ruling out the specification of any functionality which would depend on use of XML — most obviously, XML namespaces. This left an awkward gap in the semantic capabilities of HTML5, and its capability for arbitrary and permissionless extension and annotation. The replacements have been crude and kludgy; for example, HTML5 specifies an <svg> tag, because naming tags from different namespaces is obviously infeasible in HTML5. Thus, the HTML5 specification effectively has to import the SVG specification whole, as part of its own schema. In the same way, embedding markup from arbitrary other namespaces, such as RDF or Atom, becomes infeasible.

Since namespaced attributes cannot be specified, crude approximations of namespacing had to be found, such as the data- prefix. In practice, arbitrary third-party standards, such as Microdata, add their own unnamespaced and unprefixed attributes with their own semantic meaning. RDFa has subsequently had to invent a crude substitute for XML namespaces in the form of <html prefix="xyz: http://..."> to allow it to be embedded within HTML5.

By comparison, the applications use case is a much better-tended garden, as the web platform evolves some truly eyebrow-raising functionality such as WebUSB. However, faced with increasing engineering costs keeping up with the torrent of new functionality, the number of web browser implementations keeping pace continues to dwindle. The number of web browsers capable of consuming plain (X)HTML massively exceeds the number of web browser engines capable of consuming the modern application platform, a number which stands now at approximately two. Since the document platform and the applications platform are almost two different worlds sharing a stage, it makes me wonder if they should even have the same name — “the web” — for they're infinitely different beasts.

In terms of functionality, even “document-like” websites such as news websites now host a wholly excessive and not wholly benign morass of JavaScript. Misguided web designers attempt to reinvent the web browser itself inside the web browser, implementing their own navigation logic in JavaScript as a “single-page app”³. Most people react with ambivalence or dread to the fact that PDFs can contain JavaScript, at least from a security perspective, because we understand that in PDFs we're interacting with something we expect to be static and inert, yet any visit to a typical modern news website results in that website taking substantial liberties, and potentially availing itself of the entire array of functionality afforded to modern web applications. It is a bit like as if every single website now used Flash in major part — and we all know how popular that was. But we have no specific word to refer to one web or another, and web applications and ordinary websites getting carried away with bloat beyond their station are all lumped together.

In a way, this describes my own web browsing practice, which splits into two worlds; I browse with JavaScript disabled by default, and occasionally I am linked to what is apparently, without JavaScript, a blank page. I'm happy to enable it for sophisticated web applications, for things which rightfully need it — but if it's just an article I simply leave. A web page is a web page — an item of semantic hypertext; it is not an invitation to serve an (inevitably buggy) reimplementation of a web browser's own navigation logic in JavaScript which then itself loads the actual page, any more than a PDF is.

If anyone constructed a PDF, which was itself blank but, via embedded JavaScript, loaded parts of itself from a remote server, people would rightly balk and wonder what on earth the creator of this PDF was thinking — yet this is precisely the design of many “websites”. To put it simply, websites and webapps are not the same thing, nor should they be. Yet the conflation of a platform for hypertext and a platform for applications has confused thinking, and led developers with prodigious aptitude for JavaScript to mistakenly see mere websites of text as a like nail to their applications hammer. But that subject is an article in its own right, and best left to another day.

This website is 100% XHTML5.

1. Most people's reaction to hearing this particular proposal was “WTF?”, but on reflection it was the right choice. <object> is probably (X)HTML's most underappreciated element, allowing the embedding of arbitrary images, video, audio, or Flash content, or more precisely anything with a MIME type. More importantly however, its functionality is a superset of <img/> because <img/> does not support marked-up alternate text, whereas since the alternate content of <object> is the contents of the tag, any alternate text can be formatted, and in fact multiple <object> elements can be nested to allow fallback to different encodings of the same content. ⏎

2. Though admittedly not total disinterest, as Google has pushed some semantic web technologies mainly for the benefit of their own search engine. These technologies are forced into the mould of HTML5, however; see paragraph ¶. ⏎

3. There are some rare cases where single-page apps (SPAs) are actually a good idea and, in fact, necessary. But they're rare.⏎