Facebook today announced that it has released the data it used to train its artificial intelligence software to understand children’s stories and predict the word that was missing from a given sentence in a story.
The data set (.tgz) comes out to more than 1.6GB, and it’s affiliated with a recently published academic paper called “The Goldilocks Principle: Reading Children’s Books with Explicit Memory Representations.” Facebook chief executive Mark Zuckerberg provides a good overview of the research today in aFacebook post:
“Language is one of the most complex things for computers to understand. Guessing how to complete a sentence is pretty easy for people but much more difficult for machines. Historically, computers have been able to predict simple words like “on” or “at” and verbs like “run” or “eat”, but they don’t do as well at predicting nouns like “ball”, “table” or people’s names.
For this research, our team taught the computer to look at the context of a sentence and much more accurately predict those more difficult words — nouns and names — which are often the most important parts of sentences. The computer’s predictions were most accurate when it looked at just the right amount of context around relevant words — not too much and not too little. We call this “The Goldilocks Principle”. ”