The Document Outline Dilemma

Avatar of Amelia Bellamy-Royds
Amelia Bellamy-Royds on (Updated on )

For the past few weeks there has been lots of talk about HTML headings in web standards circles. Perhaps you’ve seen some of the blog posts, tweets, and GitHub issue threads. Headings have been part of HTML since the very first websites at CERN, so it might be surprising to find them controversial 25 years later. I’m going to quickly summarize why they are still worth discussing, with plenty of links to other sources, before adding my own opinions to the mix. If you’re up-to-date on the debate, you can jump straight to the “Bigger Dilemma” section.

The Story So Far…

HTML uses headings (<h1>, <h2>, <h3>, and so on until <h6>) to mark up titles for a subsequent section of text. The numbers (or levels) of the heading elements are supposed to logically correspond to a tree-like structure of nested sections, like books that have chapters with sections and sub-sections.

However, HTML markup did not originally have a way to reflect this nested logical structure in a nested DOM structure. Unlike nested lists, nested headings weren’t actually nested in elements that defined the parent sections. Heading elements of different levels were all sibling elements, and also siblings to the paragraphs they provide a title for. The “sections” were a purely logical structure, not a DOM structure, containing all markup that starts with a heading and continued until you reached another heading of the same or higher level.

As Brian Kardell points out, this made perfect sense in the “flat earth markup” of early HTML, where tags were just typographic instructions inserted into a flow of text. The concept of an HTML page as a tree structure came later, when so-called Dynamic HTML needed a document object model (DOM) to describe that flow of text and tags as a data structure that scripts could access.

Not to spoil the ending, but HTML now has a <section> element which can (optionally) be used to create a nested DOM structure to match your logical heading structure. The <main>, <header>, <footer>, <article>, <aside>, and <nav> elements all also help create a nested document structure that is reflected in DOM nesting.

But there was another problem with the originally heading model: it couldn’t easily be remixed in template systems. Because the heading level is expressed by the tag name (<h1> versus <h4>), rather than by the context in which it’s used, you can’t easily re-use the same content in a different context where the level would be different. For example, a blog might use the same set of article headlines and intro paragraphs in many contexts: as stand-alone blog post pages; as abstracts on a main index page; or as abstracts on an archive page which also has headings dividing the list by month or year. What heading level should the article title be?

Early proposals for sectioning elements also included a level-free <h> or <heading> element, that would be assigned a level based on context. (In fact, the idea goes back to the earliest discussions of HTML.) But when sectioning elements were finally added to HTML, they were designed to work with the existing heading elements. However, the specifications defined a “Document Outline Algorithm” which would re-calculate the heading levels for the existing numbered heading tags, based on section nesting.

With the Document Outline Algorithm, you could (theoretically) use an <h1> for all headings, and the browser would figure out the level of each heading based on its nesting within <article>, <section>, and related elements. The outline algorithm would ensure that the top heading in the page would be a level 1, and that all other headings would be nested in a consistent order, with no levels skipped. The WHATWG version of the outline also defines rules for dealing with multi-part headings in <hgroup> elements, so the sub-headings do not create sub-sections. (The W3C version of HTML 5 instead declared <hgroup> obsolete: multi-part headings should be marked up as paragraphs inside a section <header> or spans inside the main heading element.)

Browsers modified their default styles so that headings inside of nested sections would have progressively smaller font sizes (just like how the default style for <h3> has smaller font than <h2>, which is smaller than <h1>). But they didn’t change the way they exposed heading levels to the accessibility APIs that are used by screen readers. And screen-reader users are the only ones who really experience heading levels as part of the user interface.

Screen readers announce the numbered level when reading headings, and they allow users to quickly scan through headings of a given level. According to a WebAIM survey, two-thirds of screen-reader users scan headings as the first step of trying to find information on a long web page. For these users, the only effect of the Document Outline Algorithm was that some new pages (eagerly adopting the new spec) were presented as flat lists of level-one headings, with no structure at all.

Why won’t browsers use the outline algorithm for accessible heading levels? Many arguments have been made, but the most compelling one is that it could alter the way existing web sites are presented to screen-reader users, and it’s not clear that those alterations would mostly be positive.

Adrian Roselli has compiled a good overview of the discussions about the problems caused by the unimplemented outline specification, in “There is No Document Outline”. The latest W3C HTML specs only use the document outline algorithm to suggest how authors should synchronize their numbered heading tags with their nested sectioning elements. The WHATWG HTML specs still have the full outline algorithm described as a normative requirement, although there is an open issue where many suggest removing it altogether. As WHATWG spec editor Domenic Denicola puts it:

At some point we cannot claim that user agents are broken. They are instead rejecting our change request.

The Current Debate

The latest flurry of debate was sparked when Jonathan Neal filed an issue on the W3C HTML spec re-proposing the elusive <h> element. The key to the proposal is that an <h> heading element could have a nesting level defined by sectioning elements, while still allowing the existing numbered heading tags to have the level determined by their tag name. Authors would opt in to the outline algorithm by using the new tag. Until browsers supported <h>, a JavaScript (or server-side) polyfill could calculate the heading levels and add them into the DOM with ARIA attributes: role="heading" and aria-level="3" tell the browser to treat an element as a level-3 heading for accessibility purposes, regardless of tag name or nesting, so the page author ends up fully responsible for any heading confusion.

There’s a lot of good discussion on that issue page, and in longer linked blog posts. The main argument in favour of adding a new element is that it wouldn’t change the meaning of existing content. In addition to Neal’s arguments on GitHub, Brian Kardell’s proposal of a custom element and polyfill approaches the issue from this point of view. On the other side, Jake Archibald argues for fixing the elements we already have:

The work needed to fix the existing web is a subset of creating a new element that does the same thing, but doesn’t fix the existing web.

In other words, if the outline algorithm is so great that it’s worth a new element, why not just implement the outline algorithm for existing elements instead?

If you’re still having a hard time understanding why no one can agree about what to do with something as seemingly simple as HTML heading, Brian Kardell has helpfully stripped away all the technical details in a second post.

The Bigger Dilemma

There is a hidden assumption below all the discussion of how to create a document outline for a web page. Discussing how to create the document outline assumes that the structure of a web page can be defined as an outline: as a tree where the nesting level of a heading defines its importance.

I personally don’t think a simple nested outline can capture all the levels of meaning that are conveyed by HTML heading levels, as they are used on the web. I’ll get to why in a bit. But there’s a reason that all the discussion has focused on this type of outline: because this is the type of outline screen readers expect.

For most web users, and web authors, the document outline is irrelevant. They do not know and do not care how the headings and sections are nested, they only see what’s on the screen. And what’s on the screen, in most web pages today, is a two-dimensional layout of content, some of it nested, but some of it independent, with each part given implied importance and relationships by layout, colors, and typography.

So, the question we should be debating isn’t “How should we assign outline levels to headings?” It’s: “How can we summarize the meaningful structure of a web page, so that people using assistive technology can easily find content?”

I’d personally love it if browsers added a feature, for all users, to show the outline as a table of contents, and make it possible to quickly navigate to headings with the keyboard. Maybe if they did, more web authors would pay attention to what their outline looked like. But the browsers don’t, and so most authors don’t.

If you do want to see what your website’s heading outline looks like—and how it would theoretically look like using the document outline algorithm—you can use the W3C Nu HTML validator service, with the Show Outlines option checked.

As it currently stands, the document outline is only of daily importance to screen-reader users, and those users are currently used to dealing with the mess of erratic heading levels in web pages. I’m sure many screen-reader users would appreciate heading levels being fixed. But fixing headings for screen-reader users doesn’t just mean creating a tree of neatly nested headings with no skipped level numbers. It means creating a heading structure that accurately reflects the meaning intended by the creators of the web page, the meaning that visual users infer from style and layout. And in order to do that, we need to consider how meaning is communicated to all the users of web pages who aren’t hearing each heading announced with a numerical level.

A Language is Defined by Those Who Speak It

HTML is unique among computer code languages, because it defines so many constructs without assigning them specific behavior. Meaning in computer code is known as the semantic side of the language, as opposed to the syntactic structures of its grammar. But in most programming languages, the semantic aspects of built-in objects are still strongly tied to instructions for the computer. In JavaScript, new Date() and new Promise() have the same syntax—calling a constructor function—but your JS interpreter understands the semantic distinction between the two object names, and behaves very differently for each.

In contrast, an HTML <article> or a <section> doesn’t come with any instructions for what your web browser should do with it (other than the un-implemented outline algorithm). Instead, the difference between the two is all about the meaning of the content, a way to provide machine-readable annotations for the information communicated from one human being, the website author, to another: the reader.

Meaning in human communication is difficult to define, and never static. But most importantly of all, it is defined by the people who use the language. Dictionaries compile summaries of the meanings that are used, but they don’t restrict them. If people start using words in new and different ways, the dictionary (if it’s any good) will update their definitions.

When I was in grade school, a librarian showed off the multi-volume Oxford English Dictionary by introducing us to a selection of wild and crazy words. Google* was the name for the number that would be written as a 1 followed by 100 zeros (10100, in scientific notation). Crazy, right? Who would ever need to know a word like that? But times change. In 2006, the OED added a new definition, google as a verb (meaning to use the Google search engine), which might be used a google times more often than the number quantity in modern English conversation.

*Correction: As Mark notes in the comments, the correct spelling of the word I was shown all those years ago is actually googol. And now I don’t know what to believe anymore.

When it comes to the meaning of HTML tags, the equivalent of dictionaries are the two competing HTML specifications (WHATWG and W3C). And just like dictionaries, they both started out as efforts to describe the language as it was currently used.

The fact that there are two different HTML specifications make discussing changes more difficult, but it also strongly highlights the collective, consensus-based nature of HTML as a language. There is no one defining document setting the rules for HTML. HTML is defined by the people who write it and by the web browsers that interpret it.

But it’s not that simple, of course. HTML isn’t only used by human beings, it is also used by computers. And computers aren’t very good about handling fuzzy and shifting meaning.

Whenever you ask a computer to do something with your content—like, for example, tell the screen reader what headings there are in this website and how they are organized—it needs clear and explicit rules for how to do so. If some web authors are using heading tags in one way, and some authors are using the same tags with different meaning, your browser is going to need additional rules to figure out which is which—or else it’s going to get it wrong, at least some of the time.

The driving force of the web standards movement was a hope that all web browsers would react to web page code in (approximately) the same way. And that means defining new features in standards documents before they can be used on the web. Instead of being descriptive, like a dictionary (defining how things are), they are prescriptive, like a legal code (defining how things should be).

The slow pace of developing standards, with lots of input from browser teams, is supposed to ensure that the end result is both prescriptive and descriptive, at least for the parts of the language that describe browser behavior. But it doesn’t always work. There are lots of details in both specs that don’t match actual browser behavior. The W3C’s issue repo even has a comfortingly-named Match Reality Better label aimed at fixing these bits.

And that’s just for the features that describe what browsers should do. What about all the HTML elements that define the semantics of content? Shouldn’t those “match reality better,” too?

A few months ago Sara Soueidan suggested to the W3C HTML working group that maybe the <address> element should be valid for all addresses (and not just page-owner contact addresses). Many people before her have certainly made the same complaint. But this time, something happened. Following a little rough data scraping, which suggested that actual usage in the wild wasn’t restricted to the original definition, the definition in the W3C specs was updated.

Does it make any difference? Maybe not. Browsers don’t do anything with <address> except make it italic. And the WHATWG HTML specs still have the old definition. But it means the spec comes a little closer to describing the way code is actually used on the web, not how someone once imagined it might be.

Which brings us back, at last, to headings: How are they actually used on the web? And is it even possible to define a prescriptive set of instructions, for web authors and for web browsers, that ensure that the meaning of headings can be correctly communicated to screen readers (and potentially, other software)?

The Many Meanings of Headings

What is a heading? It’s a short title for a section of a document. The heading for this section is “The Many Meanings of Headings.” So far, so good.

But all headings are not created equal.

There are big headings:

A Big Heading

and there are much smaller headings:

A heading so small it’s barely a heading

If you inspect the code, you’ll see that one of those is an <h1> and the other is an <h6>. Both of them are wrapped in <figure> tags, which—according to the document outline algorithm—should encapsulate them and keep them from messing up the main document outline. But we all know by now that the document outline algorithm isn’t actually used by web browsers, so apologies to any screen reader users who ended up halfway down the article by mistake.

For anyone reading this article with their eyes in a modern browser, the difference between the two headings is communicated by the font size, and possibly the font style. The exact details will depend on whether you’re looking at the website’s CSS or your browser’s reading mode CSS, and on how recently Chris has changed CSS-Tricks’ styles. But unless Chris has really messed things up, it will be pretty clear to visual readers that the <h1> is bigger and more important than the <h6>. We could change the CSS so they looked identical, but at this point it is hard to understand why you would want to do that. If you wanted them to look the same, why not use the same tag name?

So let’s go a step further, and put those two headings together with some filler text in between. Here’s one way we could do that, with a main heading, some text, then a sub-heading and some more text:

See the Pen Heading outlines example #1 by Amelia Bellamy-Royds (@AmeliaBR) on CodePen.

Here’s another way to arrange the same headings and paragraphs:

See the Pen Heading outlines example #2 by Amelia Bellamy-Royds (@AmeliaBR) on CodePen.

And here’s a third:

See the Pen Heading outlines example #3 by Amelia Bellamy-Royds (@AmeliaBR) on CodePen.

If you’re only looking at the result tab of those pens, and using your eyes to do so, you might be forgiven for thinking the second and the third are identical, and very different from the first. Visually, both example #2 and example #3 have a main section with a big heading and a sidebar section with a minor heading. The difference is that one uses <div> elements to create the structure and the other uses HTML sectioning elements.

If you’ve read this far, you probably won’t be too surprised to discover that these two examples create different structures when processed by the HTML document outline algorithm. Under that algorithm, divs are ignored, so Example #2 would be treated the exact same as example #1: a main heading, some paragraph text, then a sub-heading and another paragraph. The outline does not indicate at all that the sidebar is a separate, parallel structure to the main article:

  1. A big heading

    1. A heading so small it’s barely a heading

In contrast, if I run the outline algorithm on Example #3, It tells me that there is an unlabelled main document (no top-level heading), with two equal sibling child elements (both headings treated as level-2). So now it clearly conveys the parallel structure, but not the difference in heading importance:

  1. [body element with no heading]

    1. A big heading
    2. A heading so small it’s barely a heading

I don’t think either of these outlines accurately describes that visual layout. Neither does the outline based on tag names, which not only treats the sidebar as nested in the main article, but also gets distracted by my use of <h6> in a page without any <h2/3/4/5> elements.

If I was asked to describe this layout to someone, I would tell them two things:

  • there are two, side-by-side sections;
  • one of those sections is more important than the other.

The relative importance of the components is a separate piece of information from the nesting structure—or lack thereof, in this simple example. In a more complex example, you’ll have some chunks of content with meaningful nested headings (like this article), and other chunks (like the sidebars, or the comment section below) that have parallel, independent outlines whose inner heading levels are un-related to the ones in the first chunk. Treating each parallel chunk as equal ignores the relative importance they were given in the markup. But tacking those extra headings onto the end of the main article, just because there isn’t a bigger heading in between, seems somehow worse.

Even when components are nested, they often have an importance level that is independent of the number of sections that surround them. I write books on SVG for O’Reilly. The markup we use to create the books is converted to HTML. The book (level-1 heading) has chapters (level-2 headings) with sections (level-3 headings) that sometimes have sub-sections or even sub-sub-sections (level-4 and 5). But it also has examples, and warning notes, and sidebars, all of which can have their own headings which will be styled identically irrespective of whether that component is in a regular section or a sub-sub-section. If we used the “correct” HTML heading elements, they would have different tag names, depending on the section depth, but would be styled identically.

In web design and in content management, we have two very different ways of talking about the level of a heading: the level of importance, or the level of nesting. I think that the main reason web standards folks can’t agree on an algorithm for turning headings into an outline is because people want an algorithm in which both agree, and they often don’t.

Maybe what’s really needed is to stop talking about outlines as if they re-number heading importance levels. Stop telling web developers they are wrong for using the heading levels that make sense for their content. Let context define the outline nesting, but don’t define outline nesting as if it was interchangeable with tag names. Ideally, find a way for browsers to communicate to screen readers both the nested structure of sections and the raw heading-level numbers, so the screen readers can let their users navigate by nesting structure, while still communicating the relative importance of each heading.

Then focus on the real question:

How can we summarize the meaningful structure of a web page, so that people using assistive technology can easily find content?

My instinct is that the outline that uses sectioning elements is usually better for navigation than sections based only on tag names, but that the details need to be improved. In particular:

  • There need to be better rules for collapsing un-named sections, maybe treating them as ARIA groups instead of as additional nesting levels in the outline.
  • There may need to be better rules for handling multi-part headings grouped by an <hgroup> or <header> element.
  • And there probably need to be better rules about which elements (if any) encapsulate their child headings from the main outline altogether.

Show Me the Data

But that’s just my opinion.

In order to get browsers or screen-readers to change their behavior—let alone to convince all the hundreds of thousands of web developers who are using headings in their content—we are going to need more than hunches and opinions. As I argued early on, we need some data. Both Jake and Brian have echoed that call.

But the kind of data we need isn’t the kind that can be collected by a web crawler. We need data about meaning, the kind of meaning that only real human brains can provide.

The HTML sectioning elements have been around for years now. They aren’t theoretical anymore. They are part of the language that you, web developers, use to communicate. If you’re using sectioning elements, hopefully you have a reason why. When you select a heading tag, hopefully you have a reason why. It’s time to review the HTML standards to make sure they reflect the reasons and meaning used by most developers.

So, I’m asking you: run your favorite websites (that you built or that you use) through the two outline builders in the HTML validator.

  • Do either of the outlines make sense?
  • Can you make them make sense, with reasonable tweaks to the markup that you can implement with your build systems or component frameworks?
  • Which outline is better?
  • What aspects of the document structure cause the most problems?

And while we’re at it, one more question:

How would you, as a web user, like to be able to access and navigate documents based on headings or outlines?