Hello, I’m Minh Nguyen (though I style myself Minh Nguyễn, with all the wonderful diacritics), a graduate of St. Columban School and St. Xavier High School and currently a sophomore at Stanford University. Passing by my dorm room, you might’ve seen me staring at the monitor, the monitor mutually staring back, as I type… click… type… click— blog

March 3, 2015

Last month, I left a job tending orchards in the idyllic foothills of Cupertino, eager to finally paint. I landed at Mapbox, a startup focused on advancing open data and GIS technology. Everyone thinks they know maps: either an unwieldy relic of another era or (heh) a solved problem. But geospatial data is essential to a variety of industries, and smart tools to present and analyze it are going to be a big deal. Mapbox is leading the way, and we’re doing it the right way.

February 14, 2015

Years ago, I started to collect government-issued road maps and atlases, procuring them for free mostly by stopping by roadside welcome centers and signing their guestbooks. Cartotourism requires a bit of tact: you don’t just waltz in and demand a government handout; you have to fein profound interest in the captivating state you’ve just entered. (Under the “Purpose of Visit” column: “Just passing through.”)

Pile of maps
Most states publish elegant highway maps at a cost of a few cents per copy. The more enterprising states turn them into splashy advertisements for theme parks and other attractions, easily recouping printing expenses.

Admittedly it’s a bit perverse that I would care so much about the free map amid all the signs proudly advertising free coffee, Coke, or orange juice. But evidently I’m not alone. For me, the maps proved useful during road trips, even after a GPS device displaced the family radar detector. After hours of counting cows and spotting barn ads along the most remote stretches of I-65, even the highway department could somehow keep me entertained. Something about the way they managed to cram so many names and symbols onto one large sheet of paper.

Kentucky Unfurled
For the most part, these are high-quality maps with clear cartography.

This summer, the collection grew to 58 specimens issued by 27 states, four national parks, four counties, plus Ontario and the former Metro Toronto. Some are nearly 30 years old and have the tears to prove it. The collection sports two official bike maps, a beautiful “agritourism” map, and a completely bilingual map. (Ontario’s is half in French; Louisiana’s wishes it were.) Naturally, the two maps of Texas are by far the largest in my possession. Over the years, I’ve also lost a few maps, including one that proclaimed, “There’s More Than Corn in Indiana!” Indeed: I picked it up at a rest stop nestled amid soybean fields.

Occasionally, I try to do something more interesting with the collection than keep it in a burgeoning shoebox. This time, I made it into a single U.S. map, fashioning states out of the maps they issued. It’s a map made of maps:

Map of Maps
This U.S. map is deformed to the point of resembling a cartogram.

You’ll notice that the arrangement is rather uneven. My collection is heavily skewed towards the Southeast, mostly because I traversed it almost annually during my childhood, but also because the West and New England are quite stingy when it comes to maps. I must’ve discarded California’s map; it was just a page in a travel guidebook. And the only “welcome center” I could find in Rhode Island was a Mobil station selling Mobil maps.

Map tiles
Come see the South! Play golf in North Carolina, race horses in Kentucky, smell the wildflowers in Arkansas, and admire Tennessee’s custom chrome.

Perhaps a more interesting project would be to spread out all these maps and stitch together a mosaic of the U.S. It’ll have to wait until I can find enough floor space to unfurl Texas.

February 11, 2015


Since my earliest days in high school, I have kept Minh’s Notes readers apprised of many things, teaching you how to manually create a snow day, dupe me and sound important in the process, and triple your learning rate. Reader, you are worth every minute I spend writing to you (or using writer’s block as an excuse for not writing to you). But if you only know me from this blog, I have been pretty mum about that nine-to-five part of me.

Today is my last day at Apple. (It’s a fruit company – heard of it?) That mostly means no more product giveaways to this blog’s most insightful commenters. In a little over three years, no one ever qualified, sorry. It also means the Xcode team has one fewer engineer to help sort through fan mail. Apparently they’re called “bug reports” outside Cupertino, which explains the… expressivity I’d see sometimes. I have a lot to get used to.

As for where I’m going, that’ll be the topic of a later note, following the same protocol whereby your bank sends you your PIN in one envelope followed by an explanation of that PIN in another envelope after you’ve misplaced the first. All I can say is it has little to do with the startup idea I had back in 2009.

But man, how cool would that’ve been!

October 13, 2014

Among my many roles in the Wikipedia project, I play the part of historian. Not the kind who obsesses over Civil War battles and World War I artillery, building up infoboxes the size of the USS Enterprise. That’s History, uppercase. No, I add historical content to non-history articles – lowercase history. Most articles need lowercase history to provide essential context and flavor. It’s not enough to know how things are; we need to know how things got that way and how we found out about it.

Over the past three months, I more than doubled the prose in “Flag of Ohio”, mostly by elaborating on the circumstances around the flag’s adoption. The resulting text demonstrates the power of lowercase history to link diverse topics together, in this case, the state seal, the flags of Cincinnati and Cuba, and President Garfield. I even drew up a big GIF of the proper way to fold an Ohio flag, because GIF.

Folding the flag of Ohio
The flag of Ohio is officially folded in 17 steps, easier said than done.

Once in a while, there’s even a chance to advance scholarship on a topic. Scouring Google Books led me to long forgotten accounts of an earlier Ohio flag. (It’s actually pretty boring, just a white rectangle with some details on it. I’m glad it never took off.) My sudden activity on that article attracted the attention of another editor, who gradually ate away at a factoid all my social studies teachers in school had repeated as fact: that Nepal and Ohio were the only country and state, respectively, with non-rectangular flags. In fact, there are plenty of counterexamples, from European naval ensigns to the Qing dynasty’s triangular Yellow Dragon Flag.

In another case from earlier this year, I finally quashed the silly misconception that phở is based on a French soup and even named after it. Apparently no one in the English-speaking world, not even the OED, had bothered to check with scholars fluent in Vietnamese to see whether the historical literature backed up that myth. (For the record, Cantonese speakers had much to do with the name, while the dish evolved from a Vietnamese water buffalo soup called xáo trâu. Eww?)

It’s more difficult for a Wikipedia editor to write about lowercase history than to write about the present, because Wikipedia has a stringent policy requiring “verifiable” sources. It’s easy to find websites, books, and reviews raving about phở and easy for another editor to double-check that source. But as soon as you start writing about lowercase history, you run up against all sorts of barriers: paywalls for year-old news articles, paywalls for decade-old news articles, ditto for century-old news articles that should’ve been out of copyright for generations.

Thankfully, Google (Books, Scholar, News Archive Search), HathiTrust, the Internet Archive, and various national library websites do provide access to a huge number of sources for free, if you happen to be looking for something in the right time period. If you’re looking into local or regional history, subscription databases offer even more. Depending on the state of their budget, your local library may provide access one or two good subscription databases. If not, there’s The Wikipedia Library, but you have to apply for access.

Still, searching this wealth of sources can be difficult because OCR is nowhere near as good as you’d expect in 2014, and it’s virtually absent from older or foreign-language documents. So sometimes the best sources can only be found with some guesswork: what kind of publication would cover the topic and in what years? What appears to be an original source might turn out to be regurgitated from a decade earlier, in which case the investigation starts anew.

First State Flag
I came across this Enquirer blurb (subscription required) while searching for details on Ohio’s first flag. It nearly had me going, until I saw the date: April 1, 1905, three years after the familiar double-tailed flag was adopted. Does it qualify as an April Fool’s joke if the humor is a bit stale?

There’s also the problem of bias in historical sources. I came across a great deal of vitriol directed at the flags of Ohio and Cincinnati when they were introduced and came away thinking that they were poorly received at first. In fact, it wasn’t so lopsided, but of the subscription databases I had access to, the only one covering that time period was for a highly partisan Democratic newspaper. Both flags were introduced by Republicans. (These days, that paper, The Cincinnati Enquirer, has about as much edge as that former Ohio flag.) For the phở article, too, I had to remain mindful that some French- and Vietnamese-language sources were more interested in claiming the soup for their country than establishing the truth.

Lowercase history is the most inefficient, time-consuming way to expand an article but the most effective way to increase its quality. Very often, it forces you to square competing narratives and question the assumptions that underlie the contemporary description of a topic. It also builds the reader’s trust by increasing the number and variety of sources beyond the low-hanging fruit that anyone could find via Google search.

These days, at the English Wikipedia particularly, it’s easy to feel that all the good topics have been written about. But the truth is that most of those articles still have plenty of room to grow. If you toss out labels like amateur historian, I think you’d find that writing a coherent encyclopedia depends in large part on how many fields of study you can lowercase.

November 26, 2013

Wikipedia in 2013
Wikipedia’s front door has changed little in nearly a decade.

The wall of languages at www.wikipedia.org happens to be one of the most frequently accessed series of bits on the Internet. It’s also a monument to multilingualism: a degree in modern languages may help you decipher a tenth of the page, but only after installing an assortment of obscure fonts you’ll never need for any other purpose.

Despite the page’s cognitive complexity, the whole setup is far simpler than any other portal you’ll ever visit, every bit as primitive as the design suggests. The front page of the world’s #6 website is nothing more than a hand-written, static HTML5 document that references one hand-written, dynamically minified stylesheet and one hand-written, dynamically minified JavaScript file, plus AJAX search suggestions. That’s it – no dynamic content, no analytics, no A/B testing, no special logged-in version. Everyone sees exactly the same content. When a language edition gets its thousandth article, it falls to a thankless volunteer administrator at the Wikimedia Meta-Wiki to notice the change and edit the portal manually. (Oh, and the minification was done by hand too until earlier this year.)

www.wikipedia.org is written like any Wikipedia article – almost.
Anyone can edit the portal’s temporary staging area. It’s up to administrators like me to deploy the edits.

The Wikimedia Foundation loves this old-school approach, because it saves a tremendous amount of bandwidth and gives the site a nice homegrown, organic feel to it, like that other minimalist product of San Francisco, craigslist. But doing the portal this way also has a high maintenance cost, so historically no one maintained it. I got so fed up with nagging administrators that I became one myself in 2006. Over the years, the portal has remained true to its Web 1.0 self. Aside from updates to the language lists and periodic code refactoring, the design has changed little in nearly a decade. On the technical side, support for Internet Explorer 5.5 for Windows was dropped only a few years ago, and IE 6 is still the baseline. Major design changes – say, sorting the top ten languages differently, or creating a new list for million-article-plus wikis – has required endless discussion or that dreaded Wikipedia tradition known as a poll.

Many of us have long wanted a more sophisticated way of allowing the user to select a language, or at least a more attractive one. But fear of the community at large has scuttled every radical departure from the current method of selecting a language edition, ideas like choosing from a map. It didn’t help that the portal’s purpose was misunderstood among the very people who could help, designers. A Lithuanian design agency made a splash last year with a redesign that, among other things, collapsed the sea of language links into a 16-pixel-tall, rainbow-colored strip along the top for access to just 15% of Wikipedia’s language editions (including Lithuanian, thankfully). The point was to maximize the space dedicated to search, supposedly the portal’s main function. I guess the colors were a concession to German Wikipedians who still wanted to know how close they were to beating the English Wikipedia in size.

German redesigned
Wikipedia Redefined: The design firm New proposed emphasizing search by making it harder for roughly two-thirds of Wikipedia’s users to find the wikis in their native languages.

In a perfect world, Wikipedia would know what language everyone prefers to read in and would immediately direct them to a portal in their language, with search right up front. But in a perfect world, we would just direct everyone to the Esperanto Wikipedia. Unfortunately, language selection, not search, must be the portal’s main function. The article counts are just the most obvious and transparent way to arrange the wikis, based on an ancient compromise. Don’t get me wrong: more emphasis on search would be a great idea – on each wiki’s front page. I did just that in a radical redesign of the Vietnamese Wiktionary’s front page a couple years ago.

The Vietnamese Wiktionary places search front and center.
The design of the Vietnamese Wiktionary’s front page emphasizes search. Dynamically rotating examples show the project’s breadth and encourage you to search for words in any language from the same search box. You just have to get to the Vietnamese Wiktionary first, which is why the multilingual portals must be multilingual.

I’ve updated the Wikipedia portal far more than anyone else in the seven years I’ve been an administrator. This fact gives me mixed feelings. On the one hand, it’s a unique role for a Web developer, but on the other, it’s time-consuming and extremely constrained. That role can be described as nothing more than “link herder”. In the past couple years, distractions from other projects and real life (and, I admit, sheer boredom) caused me to ignore the portal entirely. Others in the community continued to keep it updated and make improvements to the code, but deployments did slow a bit.

Recently, though, I was moved to pity for the portal. The famous “top ten” ring of languages around the puzzle ball had gotten a bit warped, probably the result of blind copy-pasting over the years. The grid of sister projects at the bottom had gotten misaligned, too. And the logos were all blurry on high-resolution screens.

The ten largest wikis formed quite an imperfect circle around Wikipedia’s puzzle ball logo.
The portal had some issues while I was gone.

After a little CSS-fu and a lot of patience with the image uploader, the same 2005 layout is now cleaner and a little more responsive. Also, in modern browsers, the search bar now supports 277 languages, up from the original 47, provided you use a localized browser or set your language preferences.

Of course, one thing led to another, and soon I was trying to tackle the very tedium that caused me to drop out of sight for two years. Updating a portal was always a laborious process that included visiting each of the top ten wikis and all the wikis on the cusp of reaching an article count milestone. There was a page that listed all the article counts, but it too was updated only sporadically, the result of yet another manual process.

Earlier this year, the Foundation enabled Lua scripting on all its wikis, including Meta-Wiki. Advanced Wikipedia editors no longer had to write template code, the Turing-incomplete programming language to article writers’ wikitext. At around the same time, a community member developed a bot that automatically compiles up-to-date article counts every night. Changes like these are huge steps away from the static publishing world Wikipedia has always lived in.

This weekend, I wrote a Lua script that connects the dots, parsing the table of article counts and the portal HTML and identifying things that need to be updated. When there are major issues, like a language that needs to be promoted up to the next “bookshelf”, it displays these issues in a basic dashboard and adds the portal to a category that tracks urgent tasks for administrators. It’s essentially an automated test of the portal.

The Lua module’s dashboard currently lists several issues that need to be addressed in the portal code.
Looks like I have some work to do.

The next step is to generate the HTML entirely in Lua, but administrators will still be needed to manually deploy each automatically calculated change. I hope these changes will help the other administrators take a more active role in keeping the portal up-to-date and do so without introducing regressions. Someday, though, it’d be great if Wikipedia would be smarter about the first foot it puts forward.



This weblog is licensed under a Creative Commons License.

Powered by Movable Type 4.38