Minh Nguyễn
June 12^th, 2005
Computing
#782

Reformed markup

The W3C, which writes the Web standards that everyone but Microsoft much follow, released the seventh draft of XHTML 2.0. XHTML is the successor to HTML, if you’ve been living in an offline cave for the last five years.

This is big stuff, since you may have to completely rewrite your pages to match this new version once it comes out, if you want to keep up with the times. For those who’ve missed the boat, then, read on for a recap of what’s new of note in XHTML 2.0, and some of my commentary on the matter…

Here’s what’s new and notable:

First of all, the first major change is that the language is being mostly rewritten. That means your webpage will have to be rewritten as well if you want to follow the latest and greatest spec. Tags that you may’ve gotten used to, like b and i, are gone. Fortunately, there are many substitutes, such as the über-cool CSS (learn).
Several new structural tags have been added:

blockcode: Useful for including big long chunks of programming code in a webpage; currently, we have to use the <blockquote><code> combo.
section and h: sections are for organizing – what else – sections of a page. They can be nested. Currently, we use lots and lots of meaningless divs (which haven’t been replaced). h will eventually replace the h1 h6 tags, and are to be used inside s tags.
separator: This element separates sections of a page, and essentially replaces hr. You can probably expect it to be rendered as a horizontal line, nonetheless.

Some changes regarding inline text tags have occurred, as well:

acronym no longer exists. Just use abbr, to avoid all that confusion about initialisms.
The meaning of cite has changed: in the past, you would markup names of publications etc. with the element, but now you are asked to use it for any “citation or a reference to other sources.”
Instead of entering in br tags whenever you want a line break, you now enclose lines in l tags (that’s a lowercase letter L, not a one).
The q tag (for inline quotes) has been renamed quote. Also, browsers are instructed not to add quotemarks around quotes by default, which Internet Explorer never did for q anyways.

The a tag remains, but you’re now encouraged to use its trademark href attribute on any element you want.
Some tags were added for lists, too:

di: This optional tag has been introduced to group dts (definition list entries) and dd (definition list definitions) together. This has always been much of an implicit grouping, but this option makes it easier to manipulate and style entries with their definitions.
nl: Many sites now code their navigation lists as lists – good news for fans of the Semantic Web. But now there’s a better way to code navigation lists: navigation lists. The nl tag contains li tags and can be nested, just like any other list.
label: You’ll now be able to label any kind of list by using this simple tag.

The del and ins tags, which only I ever use, have been replaced by the edit attribute. The datetime attribute has been freed to work with any tag.
The handler has replaced the script element of yore. It can be nested, in case the outer handler element can’t be loaded for some reason, or in case the language it’s written in is disabled. This element may be split off into its own specification, however, since it isn’t semantic in nature.
Like a’s loss of its href monopoly, img and object have also been deemed largely irrelevant. The src attribute can now be applied to any element. In fact, the draft recommends providing src with paragraphs of all things!
The meta is no longer an empty element. Instead of containing its contents in a content tag (yay for redundancy) and providing a name or http-equiv attribute, you’ll now put the contents of the tag between the opening and closing meta tags, and you’ll provide a property tag that’ll act the same way as name and http-equiv does now.
All quiet on the tabular front.
Your everyday form elements have been replaced by the core XForms tags and attributes.

And now for some commentary:

There are some cases in which there is absolutely no reason to format a bit of text in a special way (italicizing, for example), except that it’s typographical convention to do just that. Phrases like c’est la vie, for example, are only italicized because they’re French phrases and we want to look smart while using them. There are no semantics involved here, but it’s still important to indicate these formatting necessities, even with stylesheets disabled. Matthew Thomas discussed this well.
The section tag should’ve been shortened to sect. Likewise, separator should simply be sep. Some of us still code by hand, and we don’t want to get carpal tunnel just from typing up a webpage. Newbies can look it up.
The issue about abbr and acronym separate is a hard one. Acronyms are pronounced differently than most abbreviations: NATO is pronounced as a normal word, whereas Mr. is read out in full (“Mister”), not as “mrrr.” On the other hand, acronyms aren’t the only specialized abbreviations: we have initialisms, SI unit symbols, chemical symbols, and those maverick abbreviation/initialism combinations like JPEG (“jay-peg”). Not to mention other languages, like Vietnamese, which have totally different rules for pronouncing abbreviations, and even different types of abbreviations. So I suppose the W3C’s decision in this case is the right one, drawing a sensible line somewhere, rather than allowing a proliferation of various abbreviation tags.
I disagree with the change in cite’s meaning. These changes have occurred to make the language more semantically useful, but while tags are supposed to be defined based on a semantic meaning, it should also be based in some kind of visible difference in formatting. For example, strong exists because it can make text bold, even though that’s not its official purpose. The stylesheet that is included to suggest certain renderings in visual browsers still says to make any cited text italic, but what if I’m just quoting my friend? I don’t want their name italicized by default, because it doesn’t make sense to.
Many a newbie will confuse the l (L) tag for a 1 (1) tag and won’t understand what it means. It should be named the ln tag, since li is taken up by list items.
The W3C has bent over backwards for Microsoft, replacing the q tag with quote in the hopes that Microsoft will now support it, and requiring that browsers not put quotes around them like they currently do. Excuse me, but if we’re rewriting most of the language anyhow, why do we need to rename and lower the requirements for a tag that Microsoft has never supported? And now the quote element has no special formatting by default. Why use it, then? It’s more work to use the tag and specify quoting in the stylesheet than to simply use quotemarks in the document. This is a big mistake.
The W3C thinks that a is semantically equivalent to span, so they broke up the a monopoly on the href attribute. This makes things easier for us Web developers, but I think they’re giving a a bad wrap. a does have semantics: the tag signifies the exact place in the webpage where it relates to – links to – another webpage. Fortunately, the W3C is keeping the a element around for now, mainly due to muscle memory, but it’ll probably disappear in future versions.
The di tag has long been needed, and I’m glad to see it now. I’m also glad that they made it optional, so that it isn’t too much of a hassle to hand-code a small definition list.
The addition of the nl tag is reminiscent of the banner tag that was once found in the ill-fated HTML 3.0 – a space-guzzling tag that can now be emulated using a bit of CSS2. nl, however, is the most sensible special-purpose tag introduced thus far, because it is so prevalent on the Web’s pages today.
For the most part, the authors of XHTML 2.0 changed the names of tags when they changed the tags’ behaviors, to avoid any kind of naming conflict. But in this case, they’ve simply transferred the name from an already-existing element to a completely unrelated one! They should’ve called it ll, in keeping with the tradition of using initialisms for list tags. (This tag was named name in earlier drafts, by the way.)
The W3C thinks that, like a, img and object are pointless and should be integrated into paragraphs of all things! Of course, since an image is worth a thousand words and an applet worth a million, you could use src on a paragraph, thereby replacing that paragraph with the image or applet. This use would eliminate the need for such hackish techniques as FIR and sIFR, making webpages look a whole lot cooler. But img and object still have a use – well, one of them does, at least. I figure that you might want to incorporate an illustration or figure (hah! get it?) somewhere in your webpage, and perhaps it can’t be represented as a paragraph smack-dab in the middle of the vignette you’re writing. Think of all those images that you currently use, where you’re forced to use a useless alt, just to fit the spec. Here’s object’s semantic value: an illustration that is simply used to enhance the text, but isn’t a part of it.
XForms is a bit harder to learn than the old form elements, but it makes a lot more sense, and it’s a ton more powerful.

Sjoerd Visscher showed three years ago that it’s possible to code an XHTML 2.0 webpage now and style it to work in both IE and Mozilla-based browsers like Firefox. If you can’t wait to start coding with this stuff, first keep in mind that this is only a Working Draft, and that it’ll change very significantly before it reaches the Recommendation stage. It already has changed a ton since the fourth draft a couple years ago.