I get it now
The other night I was having another one of those = Can’t get to sleep nights, so, as usual I turned to TED Talks to educate me to sleep.
= I found Luis von Ahn’s talk on his ReCaptcha project and I was WOWed.
We’ve all seen those funny looking, scrambled letters and numbers for website security and we’ve all wasted a good amount of time trying to at least read those darn things. That is what we call a "Captcha".
A CAPTCHA is a program that protects websites against spambots by generating and grading tests that humans can pass but current computer programs cannot.
I Become Enlightened
Those who know me and how I work know that I’m a huge fan of collaborating = A Collaborator
= 1 + 1 = 3. Or, what some folks have become accustomed to = I give it away = Give-to-Get.
Louis von Ahn, an associate professor of Computer Science at Carnegie Mellon University, makes good use of the Web’s connectedness to collaborate in unprecedented numbers. His projects leverage the crowd for human good = BRAVO!
His company reCAPTCHA, sold to Google in 2009, digitizes human knowledge (books), one word at a time. His new project is Duolingo, which aims to get 100 million people translating the Web in every major language.
Before the Internet, coordinating more than 100,000 people, let alone paying them, was essentially impossible. But now with the Internet, I’ve just shown you a project where we’ve gotten 750 million people to help us digitize human knowledge.
Each Human-Typed Response Helps Digitize Books
With 200 million CAPTCHAs being typed every day, spending 10 seconds of human time per CAPTCHA, von Ahn was concerned about the 500,000 hours of time he’s causing humanity to waste every day.
Enter reCAPTCHA = von Ahn found a way to repurpose CAPTCHAs for book digitization. How?
Optical Character Recognition (OCR) was developed to convert scanned images of handwritten, typed, or printed text into machine-encoded text. But, OCR isn’t perfect, especially for older, worn-out books (about 30% of the words in older books are unrecognizable by the system). reCAPTCHA, was acquired by Google in 2009, improves the process of digitizing books by sending the words that cannot be read by OCR to the web in the form of an image within a CAPTCHAs for humans to decipher.
Each new, unknown word is given to the user along with another word to which the answer is already known. The user is then asked to type both words. If the user solves the one for which the answer is known, the system assumes the answer is correct for the unknown word. The system then crosschecks the answer against a number of other users to determine, with higher confidence, whether the original answer was correct = SWEET!
reCAPTCHAs are currently being used on 350,000 sites worldwide = 100 million unknown words are being translated and archived everyday = that’s approximately 2.5 million books a year!
and Let Me Know if You’re As WOWed
He’s a funny Guy too
Leave a Reply