2011-12-19

Duolingo: using free online language lessons to translate texts

Duolingo achieves an impressive win-win: Customers get to learn a foreign language for free while helping the company translate texts.

History: putting crowds to work

Duolingo’s founder, Louis von Ahn, describes the project’s history in the TED talk “Massive-scale online collaboration”. It started with Ahn inventing CAPTCHA – a way of letting humans prove that they are humans and not an automated program exploiting a website. It does so by presenting them with a task that only humans can reliably solve: Transcribing an image to a text. After a while, he came up with an idea to better use the time that people spend typing in captchas: reCAPTCHA. It uses humans to help transcribe old scanned books where automatic recognition fails at about 30% of the content.

How it works. reCAPTCHA shows two words and lets people decipher them. The results are checked for accuracy in two ways: First, of the two words, one is already known to the system, the other one is new (obviously, they are always shown in random order). Second, the same words are always transcribed by several people which lets reCAPTCHA use the most frequent transcription.

Organizing crowds. reCAPTCHA is used by many sites and processes 100 million words a day leading to 2.5 million books being transcribed per year. 750,000,000 distinct people have solved recaptchas – 10% of humanity. Compare that to previous “big projects” where crowds had to be organized, for example building the pyramids or flying to the moon. Those always involved a maximum of 100,000 thousand people, because more couldn’t be coordinated. Duolingo has been inspired by the idea of what could be achieved if one could organize just 100 million people.

Duolingo

The task tackled by the Duolingo project is to translate the web to every major language. Machine translation won’t be good enough for at least the next 15-20 years. If you want to “crowdsource” the translation in the same manner as reCAPTCHA crowdsourced transcription of scanned books then you are facing two challenges:
  • Lack of bilinguals: participants have to speak two languages well in order to be able to help.
  • Lack of motivation: translating is a lot of work. Why should you want to help?
The solution to both is to offer online language lessons. First, people will be motivated to participate. Second, they only need to know the target language well. When they begin, they are only shown simple sentences in the source language. As their knowledge increases, so does the difficulty of the sentences that they are asked to translate. While doing so, they get help from an online textbook and an online dictionary. After a translation, you deepen your understanding of difficult words via educational examples. You also can see other participants’ translations and rank them. Hence it is clear how Duolingo controls quality – it lets several people translate the same sentence and uses the best-ranked result.

Does it work? Surprisingly, Duolingo is good at two tasks. On one hand, it teaches languages as well as leading educational software. It provides the added benefit of letting students work with real content. On the other hand, translations are as accurate as those produced by professional translators. And it’s fast: Duolingo translates the English Wikipedia to Spanish in 5 weeks with 100,000 users, in 80 hours with 1 million users. Currently, the Spanish Wikipedia has 20% of the size of the English Wikipedia. Letting humans translate the remaining 80% would cost at the very least 50 million dollars.

A fair model for language education. Currently, learning a language is an expensive proposition. Duolingo enables even poor people to afford a good course. They pay with their time, not with money.

Related reading

  1. Foreign languages: four ways to avoid learning vocabulary
  2. Crowdsourcing language translations

No comments: