Minh’s Notes

Human-readable chicken scratch

Minh Nguyễn
October 28th, 2013


Cue the newbies

While the Wikimedia Foundation’s flagship project, the English Wikipedia, resists attempts to modernize the editing experience, the Vietnamese-language projects are moving full steam ahead.

Back in August, I pushed the Foundation to install VisualEditor at the Vietnamese Wikipedia ahead of schedule, giving us extra preparation time. Since then, we’ve translated the tool, written help pages, documented templates for use in VisualEditor, and addressed incompatibilities with Vietnamese input method editors. Given the positive reaction so far, I’m confident that it’ll be much more welcome here than at the English Wikipedia when the Foundation is finally ready to roll it out by default.

Wikipedia is the easy part. By contrast, Wiktionary relies on a painfully obfuscated syntax based on Wikipedia’s wikitext, but with a heavier reliance on templates. This syntax evolved in response to a fundamental technical limitation: whereas most wikis host unstructured prose, a dictionary like Wiktionary needs to hold structured data. So a user who has conquered Simonite’s example Wikipedia sentence will find themselves once again confounded by the English Wiktionary entry on “technology”:


From {{etyl|grc|en}} {{term|τεχνολογία|lang=grc|tr=tekhnologia||systematic treatment (of grammar)}}, from {{term|τέχνη|tr=tekhne|lang=grc||art}} + {{term|-λογία|lang=grc}}.

* {{a|RP}} {{IPA|/tɛkˈnɒlədʒi/}}, {{X-SAMPA|/tEk"nQl@dZi/}}
* {{a|GenAm}} {{IPA|/tɛkˈnɑlədʒi/}}, {{X-SAMPA|/tEk"nAl@dZi/}}


# {{context|uncountable|lang=en}} The organization of knowledge for practical purposes.

At least language purists can take heart that English won’t be so easily perverted. And yet, this is the stand the English Wiktionary took in favor of learnability and against the even more obfuscated system that Wiktionary’s other language editions adopted years ago. Witness the Vietnamese Wiktionary’s corresponding entry:

* [[Wiktionary:IPA|IPA]]: {{IPA|/tɛk.ˈnɒː.lə.dʒi/}} {{term|Anh}}, {{IPA|/tɛk.ˈnɑː.lə.dʒi/}} {{term|Mỹ}}

| lang = grc | term = τεχνολογία | rom = tekhnologia | meaning = ngữ pháp đầy đủ | from = {{etym-from
 | term = τέχνη | rom = tekhne | meaning = nghệ thuật
 | 2 term = -λογία

# [[kỹ thuật|Kỹ thuật]]; kỹ thuật [[học]].

Clearly, this syntax was a mistake. We adopted it on promises that machine-readability would encourage developers to support our wiki, but no one ever did. For years, even experienced Vietnamese Wikipedia editors have shied away from contributing to Wiktionary because of it. On the other hand, it allows us to keep an up-to-the-minute breakdown of entries by language, which is pretty handy.

To its credit, the English Wiktionary community does recognize the need to reduce complexity, so users wishing to start a new entry are offered a choice between two guided entry creators. The simpler option (login required) starts with some boilerplate wikitext and provides long-winded instructions for modifying it. It’s a serviceable, if-you-say-so experience for beginners. The more powerful option (login required) expects you to input the word’s ISO 369 language code, which is a nonstarter for ordinary folks. The English Wiktionary also provides a nifty tool for adding translations to an existing entry – provided it already contains at least one translation.

But I’m not convinced that the English Wiktionary is doing enough to make the site accessible to those who speak English, not wikitext, as a first language. Like Urban Dictionary, Wiktionary relies much more on casual contributors than Wikipedia. Consequently, the ideal form for creating a minimal entry would require no more than a single single-line textbox. Any more complexity and the casual contributor is much more likely to give up. Why spend ten minutes just to add a single sentence to the wiki?

The Vietnamese Wiktionary has a lot going against it, but here too we’ve made great improvements in the past year. Earlier this year, we simplified some of our most complex templates. Generating an IPA pronunciation guide for a Vietnamese word went from {{IPA|/{{VieIPA|đ|ơ|n}} {{VieIPA|g|i|ả|n}}/}} to simply {{vie-pron}}. This weekend, we turned on a brand-new entry creation tool, one that assumes no wiki expertise.

Creating a new entry at the Vietnamese Wikipedia.

The new tool walks you through the process of writing an entry, presenting the following steps, one at a time:

  1. Choose a language from the dropdown menu.
  2. Choose a part of speech from the dropdown menu.
  3. Enter a definition into the single-line textbox. As you type, an additional single-line textbox appears for another definition, if applicable. Same for synonyms and translations (for Vietnamese entries only).
  4. Click Continue. Each word in your definitions is automatically linked. (The tool checks for compound words that have Wiktionary entries.) The generated wikitext appears, in case you want to tweak anything.
  5. Click Save and be on your merry way.

Give it a try (login required). The link goes to your personal sandbox, so no knowledge of Vietnamese is required.

Notice how each step comes with few or no instructions. That’s by design: most people either don’t bother reading instructions or get so bogged down in instructions that they quit. We learned this lesson last December, when we stripped most of the instructions and scary admonitions from the Vietnamese Wikipedia’s editing page, shortening the page by 30%, and the sky didn’t fall:

Before After
Creating a new article at the Vietnamese Wikipedia, before (left) and after (right).

Unlike the English Wiktionary’s tools, the Vietnamese Wiktionary’s new entry creation tool appears automatically when you happen upon a nonexistent entry. You can’t miss it.

It’s too soon to tell whether the new tool will attract more contributors. I’m hopeful, because creating entries is finally something everyone can get right, quickly. The new entries will also require less cleanup, thanks to a lack of boilerplate and the tool’s automatic linking features.

Of course, creating entries is just the beginning. We still need better tools for editing entries, which are still written in a horribly complex syntax. The first step, which I turned on by default just moments ago, is “ToT”, a dynamically updated table of contents beside the edit box:

Editing “lavar” at the Vietnamese Wiktionary, with ToT on the right. Clicking a heading in the sidebar selects the code that produces that heading. Try it out for yourself.

We can’t easily change the syntax, but we can give you instant feedback on your edits and help you navigate entries with less effort.

ToT is based on a table of contents feature that the Foundation had originally intended to turn on for all wikis. They later backed off, no doubt due to pressure from the community.

I hope to bring some of these improvements over to the English Wiktionary once we get good data on their effectiveness at the Vietnamese Wiktionary. In the meantime, I can’t wait to see what the newbies come up with.


TrackBack URL: <http://panel.1ec5.org/mt/mt-ping.fcgi/1692>


Comments and Concerns

Press “Submit” to send this card to Minh. Your comment will appear on this page as soon as he approves it. No solicitation, please. Your e-mail address is never displayed on this website and will not be shared with any other entity.