Deep Learning: Machines can now recognize objects and translate speech in real time

527

When Ray Kurzweil met with Google CEO Larry Page last July, he wasn’t looking for a job. A respected inventor who’s become a machine-intelligence futurist, Kurzweil wanted to discuss his upcoming book How to Create a Mind. He told Page, who had read an early draft, that he wanted to start a company to develop his ideas about how to build a truly intelligent computer: one that could understand language and then make inferences and decisions on its own.

It quickly became obvious that such an effort would require nothing less than Google-scale data and computing power. “I could try to give you some access to it,” Page told Kurzweil. “But it’s going to be very difficult to do that for an independent company.” So Page suggested that Kurzweil, who had never held a job anywhere but his own companies, join Google instead. It didn’t take Kurzweil long to make up his mind: in January he started working for Google as a director of engineering. “This is the culmination of literally 50 years of my focus on artificial intelligence,” he says.

Kurzweil was attracted not just by Google’s computing resources but also by the startling progress the company has made in a branch of AI called deep learning. Deep-learning software attempts to mimic the activity in layers of neurons in the neocortex, the wrinkly 80 percent of the brain where thinking occurs. The software learns, in a very real sense, to recognize patterns in digital representations of sounds, images, and other data.

The basic idea—that software can simulate the neocortex’s large array of neurons in an artificial “neural network”—is decades old, and it has led to as many disappointments as breakthroughs. But because of improvements in mathematical formulas and increasingly powerful computers, computer scientists can now model many more layers of virtual neurons than ever before.

With this greater depth, they are producing remarkable advances in speech and image recognition. Last June, a Google deep-learning system that had been shown 10 million images from YouTube videos proved almost twice as good as any previous image recognition effort at identifying objects such as cats. Google also used the technology to cut the error rate on speech recognition in its latest Android mobile software. In October, Microsoft chief research officer Rick Rashid wowed attendees at a lecture in China with a demonstration of speech software that transcribed his spoken words into English text with an error rate of 7 percent, translated them into Chinese-language text, and then simulated his own voice uttering them in Mandarin. That same month, a team of three graduate students and two professors won a contest held by Merck to identify molecules that could lead to new drugs. The group used deep learning to zero in on the molecules most likely to bind to their targets.

Read the source article at MIT Technology Review.