IBM has announced that it broke the industry record for speech recognition, creating a technology that recognizes spoken words ever closer to human parity.
Last year, IBM announced a major improvement in conversational speech recognition: a system that achieved a 6.9 percent word error rate. Since then, IBM Researchers have continued to push the boundaries of accuracy rates, achieving this historic milestone and setting an industry record of 5.5 percent, a 20% improvement from the rate than was reported six months prior.
“These speech developments build on decades of research, and achieving speech recognition comparable to that of humans is a complex task. At IBM, we are dedicated to creating the technology that will one day match the complexity of how the human ear, voice and brain interact,” said Michael Karasick, IBM Vice President, Cognitive Computing. “This progress will have important implications for how man and machine collaborate in the future, making the interactions more natural and productive. We believe it is only a matter of time before we achieve parity on speech recognition with humans.”
The success of speech recognition technology is measured against human parity, an error rate on par with that of two humans speaking. Previously, human parity was considered a 5.9 percent word error rate; IBM partnered with Appen, a speech and technology service provider, to reassess the industry benchmark and determined that human parity is lower than what anyone has yet achieved: 5.1 percent.
In the face of other industry claims, this research, in partnership with Appen, shows finding a standard measurement for human parity across the industry is more complex than it seems. As IBM continues to develop and improve upon this technology, its researchers will remain accountable to the highest standards of accuracy when measuring for it for the findings to be truly valuable.
“In spite of impressive advances in recent years, reaching human-level performance in AI tasks such as speech recognition or object recognition remains a scientific challenge. Indeed, standard benchmarks do not always reveal the variations and complexities of real data,” says Yoshua Bengio, leader of University of Montreal’s Institute for Learning Algorithms. “IBM continues to make significant strides in advancing speech recognition by applying neural networks and deep learning into acoustic and language models.”
“The ability to recognize speech as well as humans do is a continuing challenge, since human speech, especially during spontaneous conversation, is extremely complex,” Julia Hirschberg, a professor and Chair at the Department of Computer Science at Columbia University. “IBM’s recent achievements in speech recognition are quite impressive, as is IBM’s dedication to better understand how we measure the success speech technology and industry benchmarks.”
Today’s achievement builds upon IBM’s recent advancements in language and speech technology, gained from IBM’s decades of experience researching, developing and investing in AI technology. These research developments are critical to advancing the development and adoption of cognitive around the globe; as we continue to strengthen and improve upon our speech and language technology, these updates will be embedded in the cognitive capabilities we offer via the Watson Developer Cloud.