Wednesday, July 16, 2008
ReCaptcha: Reusing your 'wasted' time online
Chances are that if you've solved one of those distorted-word tests to secure an account with Facebook, Craigslist, or Ticketmaster, you've helped The New York Times inch a little closer to digitizing its entire print newspaper archive from 1851 to 1980.
How have you unwittingly helped the Gray Lady by wasting 10 seconds on a computer-generated word challenge? It's thanks to a year-old initiative called ReCaptcha, a play on the antispam tests known as Captchas (Completely Automated Public Turing Test To Tell Computers and Humans Apart), a test that people can pass, but machines cannot.
People typically fill out Captchas so Web sites can verify that a human, rather than a spam bot, is behind the request for a new e-mail address, log-in, or membership. But with ReCaptchas, which are double-word tests, humans are also helping machines better recognize faded-ink or blurry words that have been digitally scanned from old newspapers or books--text that's difficult for a computer to recognize optically. That way, people will eventually be able to sift through print archives with a more intelligent search engine.
In the last year, as many as 600 million people have completed at least one ReCaptcha on sites such as Twitter, LastFM, and Ticketmaster, which use the technology for free, according to ReCaptcha creator and Carnegie Mellon University assistant professor Luis von Ahn.
With all those helping hands, von Ahn expects that The New York Times digitization project will be finished by the end of 2009, at the latest. (About five months ago, The New York Times paid an undisclosed sum to von Ahn's CMU team to complete its project.)
"We're reusing wasted human cycles," von Ahn, 28, said while speaking at a robotics conference here recently.
The venture involves putting millions of eyes on words printed in roughly 47,000 newspapers, with various counts of pages. For example, before the turn of the century, The New York Times was about one-fourth the breadth it is today. It's doubled in size about every 50 years or so since its beginning in the 1850s, when it was published every day except Sunday. (The New York Times did not immediately respond to a request for comment for this story.)
Von Ahn's team is also helping the Internet Archive with the digitization of books through ReCaptcha, but it's doing that project gratis.
In fact, von Ahn, a recipient of the MacArthur Fellowship (or "genius award") in 2006 for his work as a computer scientist, only wants to aid projects that work for the good of humanity. His main work-related guilt, it seems, is that he helped invent Captchas in the first place (in 2000, so that Yahoo could fend off spammers). And that's only because he's factored how much time people have wasted on the four- to six-character tests. He's estimated that people type 200 million Captchas every day around the world, or a collective estimate of 500,000 man hours (at 10 seconds per puzzle).
But that lost time is nothing compared with the amount spent on games--another key focus for von Ahn. By the time the average American has turned 21, researchers estimate that he or she has spent about 10,000 hours playing video games--that's the equivalent of holding down a full-time job for five years. In 2003, players collectively spent 9 billion human hours on the game Solitaire. In contrast, building the Empire State Building took only 7 million human hours, or the equivalent of a collective 6.8 Solitaire hours...(more)