Reformed markup
The W3C, which writes the Web standards that everyone but Microsoft much follow, released the seventh draft of XHTML 2.0. XHTML is the successor to HTML, if you’ve been living in an offline cave for the last five years.
This is big stuff, since you may have to completely rewrite your pages to match this new version once it comes out, if you want to keep up with the times. For those who’ve missed the boat, then, read on for a recap of what’s new of note in XHTML 2.0, and some of my commentary on the matter…
Here’s what’s new and notable:
- First of all, the first major change is that the language is being mostly rewritten. That means your webpage will have to be rewritten as well if you want to follow the latest and greatest spec. Tags that you may’ve gotten used to, like
b
andi
, are gone. Fortunately, there are many substitutes, such as the über-cool CSS (learn). - Several new structural tags have been added:
blockcode
- Useful for including big long chunks of programming code in a webpage; currently, we have to use the
<blockquote><code>
combo. section
andh
section
s are for organizing – what else – sections of a page. They can be nested. Currently, we use lots and lots of meaninglessdiv
s (which haven’t been replaced).h
will eventually replace theh1
h6
tags, and are to be used insides
tags.separator
- This element separates sections of a page, and essentially replaces
hr
. You can probably expect it to be rendered as a horizontal line, nonetheless. - Some changes regarding inline text tags have occurred, as well:
acronym
no longer exists. Just useabbr
, to avoid all that confusion about initialisms.- The meaning of
cite
has changed: in the past, you would markup names of publications etc. with the element, but now you are asked to use it for any “citation or a reference to other sources.” - Instead of entering in
br
tags whenever you want a line break, you now enclose lines inl
tags (that’s a lowercase letter L, not a one). - The
q
tag (for inline quotes) has been renamedquote
. Also, browsers are instructed not to add quotemarks aroundquote
s by default, which Internet Explorer never did forq
anyways. - The
a
tag remains, but you’re now encouraged to use its trademarkhref
attribute on any element you want. - Some tags were added for lists, too:
di
- This optional tag has been introduced to group
dt
s (definition list entries) anddd
(definition list definitions) together. This has always been much of an implicit grouping, but this option makes it easier to manipulate and style entries with their definitions. nl
- Many sites now code their navigation lists as lists – good news for fans of the Semantic Web. But now there’s a better way to code navigation lists: navigation lists. The
nl
tag containsli
tags and can be nested, just like any other list. label
- You’ll now be able to label any kind of list by using this simple tag.
- The
del
andins
tags, which only I ever use, have been replaced by theedit
attribute. Thedatetime
attribute has been freed to work with any tag. - The
handler
has replaced thescript
element of yore. It can be nested, in case the outerhandler
element can’t be loaded for some reason, or in case the language it’s written in is disabled. This element may be split off into its own specification, however, since it isn’t semantic in nature. - Like
a
’s loss of itshref
monopoly,img
andobject
have also been deemed largely irrelevant. Thesrc
attribute can now be applied to any element. In fact, the draft recommends providingsrc
with paragraphs of all things! - The
meta
is no longer an empty element. Instead of containing its contents in acontent
tag (yay for redundancy) and providing aname
orhttp-equiv
attribute, you’ll now put the contents of the tag between the opening and closingmeta
tags, and you’ll provide aproperty
tag that’ll act the same way asname
andhttp-equiv
does now. - All quiet on the tabular front.
- Your everyday form elements have been replaced by the core XForms tags and attributes.
And now for some commentary:
- There are some cases in which there is absolutely no reason to format a bit of text in a special way (italicizing, for example), except that it’s typographical convention to do just that. Phrases like c’est la vie, for example, are only italicized because they’re French phrases and we want to look smart while using them. There are no semantics involved here, but it’s still important to indicate these formatting necessities, even with stylesheets disabled. Matthew Thomas discussed this well.
- The
section
tag should’ve been shortened tosect
. Likewise,separator
should simply besep
. Some of us still code by hand, and we don’t want to get carpal tunnel just from typing up a webpage. Newbies can look it up. - The issue about
abbr
andacronym
separate is a hard one. Acronyms are pronounced differently than most abbreviations: NATO is pronounced as a normal word, whereas Mr. is read out in full (“Mister”), not as “mrrr.” On the other hand, acronyms aren’t the only specialized abbreviations: we have initialisms, SI unit symbols, chemical symbols, and those maverick abbreviation/initialism combinations like JPEG (“jay-peg”). Not to mention other languages, like Vietnamese, which have totally different rules for pronouncing abbreviations, and even different types of abbreviations. So I suppose the W3C’s decision in this case is the right one, drawing a sensible line somewhere, rather than allowing a proliferation of various abbreviation tags. - I disagree with the change in
cite
’s meaning. These changes have occurred to make the language more semantically useful, but while tags are supposed to be defined based on a semantic meaning, it should also be based in some kind of visible difference in formatting. For example,strong
exists because it can make text bold, even though that’s not its official purpose. The stylesheet that is included to suggest certain renderings in visual browsers still says to make anycite
d text italic, but what if I’m just quoting my friend? I don’t want their name italicized by default, because it doesn’t make sense to. - Many a newbie will confuse the
l
(L) tag for a1
(1) tag and won’t understand what it means. It should be named theln
tag, sinceli
is taken up by list items. - The W3C has bent over backwards for Microsoft, replacing the
q
tag withquote
in the hopes that Microsoft will now support it, and requiring that browsers not put quotes around them like they currently do. Excuse me, but if we’re rewriting most of the language anyhow, why do we need to rename and lower the requirements for a tag that Microsoft has never supported? And now thequote
element has no special formatting by default. Why use it, then? It’s more work to use the tag and specify quoting in the stylesheet than to simply use quotemarks in the document. This is a big mistake. - The W3C thinks that
a
is semantically equivalent tospan
, so they broke up thea
monopoly on thehref
attribute. This makes things easier for us Web developers, but I think they’re givinga
a bad wrap.a
does have semantics: the tag signifies the exact place in the webpage where it relates to – links to – another webpage. Fortunately, the W3C is keeping thea
element around for now, mainly due to muscle memory, but it’ll probably disappear in future versions. - The
di
tag has long been needed, and I’m glad to see it now. I’m also glad that they made it optional, so that it isn’t too much of a hassle to hand-code a small definition list. - The addition of the
nl
tag is reminiscent of thebanner
tag that was once found in the ill-fated HTML 3.0 – a space-guzzling tag that can now be emulated using a bit of CSS2.nl
, however, is the most sensible special-purpose tag introduced thus far, because it is so prevalent on the Web’s pages today. - For the most part, the authors of XHTML 2.0 changed the names of tags when they changed the tags’ behaviors, to avoid any kind of naming conflict. But in this case, they’ve simply transferred the name from an already-existing element to a completely unrelated one! They should’ve called it
ll
, in keeping with the tradition of using initialisms for list tags. (This tag was namedname
in earlier drafts, by the way.) - The W3C thinks that, like
a
,img
andobject
are pointless and should be integrated into paragraphs of all things! Of course, since an image is worth a thousand words and an applet worth a million, you could usesrc
on a paragraph, thereby replacing that paragraph with the image or applet. This use would eliminate the need for such hackish techniques as FIR and sIFR, making webpages look a whole lot cooler. Butimg
andobject
still have a use – well, one of them does, at least. I figure that you might want to incorporate an illustration or figure (hah! get it?) somewhere in your webpage, and perhaps it can’t be represented as a paragraph smack-dab in the middle of the vignette you’re writing. Think of all those images that you currently use, where you’re forced to use a uselessalt
, just to fit the spec. Here’sobject
’s semantic value: an illustration that is simply used to enhance the text, but isn’t a part of it. - XForms is a bit harder to learn than the old form elements, but it makes a lot more sense, and it’s a ton more powerful.
Sjoerd Visscher showed three years ago that it’s possible to code an XHTML 2.0 webpage now and style it to work in both IE and Mozilla-based browsers like Firefox. If you can’t wait to start coding with this stuff, first keep in mind that this is only a Working Draft, and that it’ll change very significantly before it reaches the Recommendation stage. It already has changed a ton since the fourth draft a couple years ago.