Google plans to accelerate its massive efforts to scan tens of millions of books and periodicals with the acquisition on Wednesday of a company called reCAPTCHA.
ReCAPTCHA is a well-known provider of CAPTCHA technology, which is used to prevent spammers from using computers to automatically register for online services, such as webmail accounts and Web site registrations.
CAPTCHA, which stands for "Completely Automated Public Turing test to tell Computers and Humans Apart," requires users to type randomly chosen words that appear as images, a process that is easy for humans but hard for computers to do correctly.
What attracted Google to ReCAPTCHA is that the company has linked its core authentication service with efforts to digitise print books and periodicals. The search giant has a massive effort underway in that area for its Google Books and Google News Archive services.
ReCAPTCHA takes its word images from scanned print materials. Every time people solve a CAPTCHA from the company, they are also, as a byproduct, helping to turn scanned words into plain text that can be indexed and made searchable by search engines.
"So we'll be applying the technology within Google not only to increase fraud and spam protection for Google products but also to improve our books and newspaper scanning process," reads a post on Google's official blog authored by Luis von Ahn, cofounder of reCAPTCHA, and Will Cathcart, a Google product manager.
The ReCAPTCHA service is used by about 100,000 Web sites, and it is helping to digitise old editions of The New York Times.
Find your next job with computerworld UK jobs