Teaching machines to read and understand natural language has been one of the biggest challenges facing computer scientists working to advance artificial intelligence. Although systems have been put to the task of reading documents and answering questions about the contents, the lack of large scale training and test datasets have limited the progress that has come of this.
DeepMind Technologies, Google’s AI lab known for shaking the field time and time again, figured out they didn’t need to create their own dataset because the perfect one was right in front of them—the deep vault of articles published online by CNN and The Daily Mail. In a recent study, their researchers used hundreds of thousands of articles from the two media companies to teach their AI systems to read.
So why articles from The Daily Mail and CNN? It turns out their style of including concise bullet points that summarize their articles is key for the machine reading systems to learn to understand natural language content.