Doing Well By Doing Good: Re-CAPTCHA at the Cross-Section of Technology, Economics, and Psychology

CAPTCHAs can be found everywhere today–even at unusal places. Image at http://www.rizzotees.com

A colleague of mine at Infosys, Kapil Ashok Jaiswal, posted an article on our company internal blog site. I am thankful that he gave me the permission to re-post his contribution on my blog.

He had titled his blog “Killing Two Birds With One Stone“–being a biologist by training (and vocation) I had to choose a different title.

He observes how CAPTCHAs have been succeeded by reCAPTCHAs and how by solving the task of filling in the CAPTCHA crowd-sourcing of tasks is being achieved.

The term CAPTCHA (for Completely Automated Public Turing Test To Tell Computers and Humans Apart) was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas Hopper and John Langford of Carnegie Mellon University (source: http://www.captcha.org).

Here Kapil’s post:
(Begin Quote)

A few days ago, I came across an innovative model which made quite an impression on me. Since then, I have been mulling over it, again and again, to find more applications of this wonderful model. As crowd-sourcing is in fashion today, I thought “why not source some ideas from my fellow colleagues?”

Let me first explain this innovative business model. I am sure most of you would have come across CAPTCHAs, a distorted sequence of characters, which you had to fill out in some sort of web form. Well, the idea behind it is to ensure that it’s humans who are filling the forms and not some sort of computer program. It turns out that approximately 200 million CAPTCHAs are typed everyday by people around the world. Each time a person types a CAPTCHA, essentially he ends up spending at least 10 seconds of his time to do so.

If you multiply that by 200 million, you see that humanity as a whole is spending about 500,000 hours every day typing CAPTCHAs. Well, I should point out here that a lot of innovation comes from how you see things. The inventor of CAPTCHA had a socialist mindset with, of course, an innovative flavor to it. To capitalize on this huge effort which he saw as wastage, he came up with reCAPTCHA, a revision of the CAPTCHA concept.

Example for a simple CAPTCHA (from googlemail.com)

reCAPTCHA is nothing but two strings of words next to each other. Well, while the concept remains the same as in CAPTCHA, there is a unique thing happening behind the scene. Let me take a step back and explain a few missing links. There are a lot of projects out there trying to digitize books. The digitization process uses OCR – optical character recognition – a technique to figure out what is the text in the image of the scanned book. Now the problem is that OCR is not perfect, especially for older books.

For example, for books that were written more than 50 years ago, the OCR technique cannot recognize about 30% of the words. So here is one solution: what if we could have a human being recognize those words? But then the next question is: at what cost?

If we go by behavioral economics, people are more geared towards a task if they are intrinsically motivated rather than extrinsically motivated. For example, when you are writing a complex piece of code, the whole desire of seeing it work is far better a push for you than any tangible reward. So, if we need to pay people to recognize all these words, it’s not only going to be expensive due to the sheer scale of the project, but it will also be difficult to motivate people to carry out this mundane task.

Example of a reCAPTCHA. One of the words could not be dititally recognized by OCR. But you won’t know which one. Source: karouselmag.com

So how do we solve this problem? Some brilliant minds got to work and “EUREKA!” – they found the solution. Remember, reCAPTCHA! It’s like killing two birds with one stone. Here’s how it works. One of the words presented in reCAPTCHA is a word from the scanned book which was unrecognized by the OCR. The second word in reCAPTCHA is known to the system. The system doesn’t tell you which one is which, and the user needs to type both words. If the user types the correct word for the one known to the system, it assumes that the user is a human being, and gains confidence that user has also typed the other word correctly. The same process is repeated with 10 different people for the same unrecognized word and if all of them agree on what the new word is, then one more word has been digitized accurately.

Have you heard about DUOLINGO? Guess what, it works on the same concept. DUOLINGO helps people learn different languages like Spanish, French, etc. Behind the scene, it helps translate tons of Wikipedia pages into different languages. If you are interested in learning a foreign language, this is highly recommended.

The strength of the model lies in motivating people towards doing something interesting and, in turn, employing their collective power to solve some real problems. If you analyze this carefully, this model lies at the sweet spot of the intersection of technology, economics and psychology.
Reference : Wikipedia & Ted

(End Quote)

Do you know of other areas where this kind of cooperative effect plays between something that you need to do and while doing so helping the community?

Kaphil wonders: “in a 100,000 person company if everyone gives just 1 minute of his/her time in a day, we have around 200 person days effort. So we have the scale to solve big problems. Let me tell you something about this magic figure – 100,000. It’s said that all these missions – building the pyramids of Egypt or the Panama Canal and putting a man on the Moon – employed this magic number. Now the challenge is to find something meaningful which drives each one of us to give our 1 minute and in turn ‘Make the Mountain Move’.”

Would you share any ideas about new areas where this concept could be taken further? Just enter a comment below.

Google reCAPTCHA (onetechbuddy.com)
Digitizing Books One Word at a Time (google.com)
Every time you type a two-word Captcha, you’re helping to digitize the world’s printed archives. (simsblog.typepad.com)
Can You Crowdsource Without Even Knowing It? (npr.org)

Leave a comment Cancel reply

Welcome To My Website!

Search in the articles:

Email Subscription

Visitors’ Geo Location

Tags in this blog

Recent Posts

Top Posts & Pages

Previous Posts

Blogs I Read

Google Translate

Subscribe to ponderingtechnology

My recent tweets