Minh’s Notes

Human-readable chicken scratch

Saturday, May 1st, 2021

The copyeditors’ consensus

From 2011 to 2013, English Wikipedia editors passionately debated whether to prominently spell Vietnamese names with Vietnamese diacritics. Should there be an article on “Vung Tau” or “Vũng Tàu”, “Dien Bien Phu” or “Điện Biên Phủ”, “Dang Huu Phuc” or “Đặng Hữu Phúc”, “pho” or “phở”?

In its early years, Wikipedia’s content management software, MediaWiki, adopted then-state-of-the-art Unicode support to accommodate a quickly growing roster of foreign language editions of the encyclopedia. Drunken with this capability, English Wikipedians systematically embedded inscrutable IPA pronunciations in lede paragraphs, umlauts in the names of heavy metal bands, and zalgo text when signing their own names in discussions. But somehow the privilege of advanced Latin typography didn’t extend to Vietnamese people and places without a great deal more controversy. Superficially, the disagreement was over article titles, but due to how Wikipedia is written, any decision would gradually affect links to those titles, other mentions in running text, and articles translated into other languages.

The debate spread across dozens of discussion pages as editors attempted to get individual articles renamed, to put facts on the ground supporting their positions. One of the most prolific editors on Vietnamese topics eventually got banned for using disingenuous sockpuppet accounts to manufacture consensus to their liking. Even Jimbo Wales, cofounder of Wikipedia, weighed in with exasperation at the “excessive”, “ridiculous”, and “appalling” sight of stacked diacritics in English.

Wikipedia prides itself on being descriptive, rather than prescriptive. It relies on other reliable sources instead of trying to ascertain the truth by itself. One common refrain was that English-language published works routinely strip diacritics from Vietnamese names as a matter of policy. By 2012, some niche book publishers had begun printing Vietnamese diacritics. But the Associated Press stubbornly stuck to the basic English alphabet, the New York Times admitted accent marks for only a few favored European languages, and National Geographic specifically singled out Vietnamese for second-class status. News organizations heavily influenced the debate because their daily articles accounted for so many Google search results. Somehow, some of the most hastily written documents in the entire publishing industry was to set the typographical standard for the most deliberatively written reference work in history.

Ultimately, all the spilled ink came to nothing: per project policy, a lack of consensus means preserving the status quo. In practice, many articles have remained titled with diacritics, because diacritics distinguish completely unrelated words in every case. But editors have had to tread carefully around latent controversy when titling new articles or trying to make existing titles more consistent. The encyclopedia that anyone can edit has some advice for you: don’t go there.

The issue of diacritics on names is inherently personal for me, but I didn’t take offense at the many melodramatic, misinformed comments against Vietnamese diacritics. Surely the excess consonants and syllables in Welsh names would’ve elicited the same calls for simplification. My vote was essentially a sigh of resignation. I knew English language purists could only delay the effects of globalization for so long.

I had already seen these forces cut the other way, pressuring the Vietnamese Wikipedia to eschew the traditional names for overseas people and places in favor of bewildering, often unpronounceable English spelling patterns. Otherwise, it might’ve had to mimic the Vietnamese government’s official encyclopedia, which tries so desperately to hold the line on traditional phonetic spelling that it effectively invents its own novel alphabet: “Anhxtanh” (Einstein), “Penziat” (Penzias), and “Uynxơn” (Wilson) all contribute to the discussion on “Bich Beng” (the Big Bang). Imagine a generation of schoolchildren recalling the role of “Rudơven” (Roosevelt), “Tơruman” (Truman), and “Sơcsin” (Churchill) in World War II. Why would an English encyclopedia stop at stripping diacritics? Why not make Vietnamese names truly intuitive through phonetic respelling?

Sure enough, the tide is slowly turning. In 2019, the AP – which not long ago insisted on ``eyesore quotation marks'' – began to incorporate diacritics into personal names. As a wire service, its style guide has outsized influence. Yesterday, I was surprised to see the Times Opinion section publish its first-ever byline with Vietnamese diacritics, right on the homepage. The article contains several references to Vietnamese people, places, and terms replete with diacritics. Rather fittingly, the op-ed by Nguyễn Phan Quế Mai calls attention to a far more serious double standard in the lack of compensation to Vietnamese Agent Orange victims. Even if the extra attention to typographic detail doesn’t make its way into the print edition, due to typesetting constraints, I hope it’s the start of a trend online.

Yes, it’s Minh, like with a G at the end.

Some Vietnamese diacritics also appeared a few weeks earlier in this interactive in the Arts section, and more recently in a Hawaiian name in the Sports section, both articles dealing with issues of identity.


Short-term memory

  1. The copyeditors’ consensus

    (5/01/2021)

    For three years, English Wikipedia editors passionately debated whether to prominently spell Vietnamese names with Vietnamese diacritics.

  2. Cover to cover

    (4/18/2021)

    I keep getting more deeply involved in the monumental task of completing OpenStreetMap because, paradoxically, it’s unfinishable. Even a pandemic, for all its horrors, presented an opportunity to make a difference through mapping.

  3. The main course

    (6/06/2020)

    Who am I to comment on the terrible things that keep happening in this country? Me, I’m just someone who’s lived a sampler platter of a life.

  4. Local color

    (1/24/2020)

    A year ago, I was pretty sure I’d be spending all my free time contributing buildings and turn lanes to OpenStreetMap. I did contribute plenty of them, but I also have a tendency to get distracted by ideas out of left field. During the past year, I wound up contributing several kinds of features to OSM that never make it onto conventional maps. At some point, I took up mapping flags.

  5. The bridge where friends meet

    (2/22/2019)

    The Benson Street Bridge, or “Rainbow Bridge”, marks the city limit between Reading and Lockland, Ohio. Residents are fond of mentioning a sign that hangs over the bridge, proclaiming both Cincinnati suburbs to be “Where Friends Meet”. But if you talk to enough people from the surrounding area, you eventually hear whispers about a less friendly sign that used to be posted at the city limit, warning nonwhites not to set foot in Reading.


The name’s Minh Nguyễn. I’m a San José–based software developer, free content and open data enthusiast, and ardent defender of diacritics everywhere. Since March 2002, Minh’s Notes has been home to my occasional insights and frequent attempts at humor.