Carnegie Mellon University

Stop Spam. Read Books.

Bookmark and ShareTweet this storyShare this story on FacebookEmail this story with a friendSubscribe to Homepage Story RSS FeedArchivesSubmit a Story

reCAPTCHAs Solve Scanning Hitch

Photo

Ever had trouble with your computer "reading" text from a scanned document? What about deciphering that oddly jumbled text you have to translate before proceeding with some online forms? Take the two sometimes frustrating experiences and you've got a new Carnegie Mellon solution: reCAPTCHA.

Carnegie Mellon's Luis von Ahn, an assistant professor of computer science and recipient of the MacArthur Foundation "genius grant," is enlisting the help of thousands, if not millions, of web users every day to eliminate a technical bottleneck slowing efforts to digitize text.

Key to the new project are CAPTCHAs, the distorted-letter tests found at the bottom of registration forms on Yahoo!, Hotmail, PayPal, Wikipedia and hundreds of other sites worldwide.

Performing these simple visual tests, tells the program you're human, not a computer program designed by spammers to harvest free email accounts. Working with a Carnegie Mellon team that includes computer science professor Manuel Blum, undergraduate student Ben Maurer and research programmer Mike Crawford, von Ahn invented a new version of the tests, called reCAPTCHAs.

reCAPTCHAs will help convert printed text into computer-readable letters on behalf of the Internet Archive, a San Francisco-based non-profit group that administers the Open Content Alliance. It's one of several large initiatives working to digitize books, making the text searchable.

The systems that scan printed text and convert it into digital text are often stumped by underlined text, scribbles and fuzzy or otherwise poorly printed letters. reCAPTCHAs will use words from these troublesome passages to replace the artificially distorted letters and numbers typically used in CAPTCHAs, letting humans do the work that the systems can't.

Von Ahn hopes to substitute his reCAPTCHAs for as many conventional CAPTCHAs as possible.

"It is estimated that 60 million or more CAPTCHAs are solved each day, with each test taking about 10 seconds," he said. "That's more than 150,000 precious hours of human work that are lost each day, but that we can put to good use with reCAPTCHAs."

With support from Intel Corp., von Ahn's team has devised a free, Web-based service that allows individual webmasters to install reCAPTCHAs to protect their sites. 

Related Links: reCAPTCHA  |  News Release  |  School of Computer Science