By AI Trends Staff
A lip-reading app from Irish startup Liopa is said to represent a breakthrough in the field of visual speech recognition (VSR), which trains AI to read lips without any audio input.
Liopa’s product, SRAVI (Speech Recognition App for the Voice Impaired) is a communication aid for speech-impaired patients. It is likely to be the first lip-reading AI app available for public purchase, according to an account from Vice/Motherboard.
Researchers driven by a range of potential commercial applications including surveillance tools have been working for years to teach computers to lip-read, and it has proven a challenging task. Liopa is working to certify SRAVI as a Class I medical device in Europe, hoping to complete the certification by August. That would allow it to begin selling to healthcare providers.
Many tech giants are also working on lip-reading AI. Scientists affiliated with or working directly for Google, Huawei, Samsung, and Sony are all researching VSR systems and appear to be making rapid advances, according to the Motherboard account.
Liopa Wins Second Contract for UK Defense and Security Research
How lip-reading AI is being developed and how it might be deployed are becoming causes for concern. Liopa recently announced that it has been selected to take part in Phase 2 of the DASA Behavioural Analytics initiative, aimed at helping the UK’s Defense and Security Accelerator develop capability in behavioral analytics. These are defined as “context-specific insights” derived from data on individuals and groups, which could enable “reliable predictions about how they are likely to act in the future.”
The hoped-for tool would allow law enforcement agencies to search through silent CCTV footage and identify when people say certain keywords.
The Liopa VSR engine takes video of a subject(s) speaking as input, and uses AI to predict the subject’s most likely utterances, according to a press release from Liopa, which is based in Belfast, Northern Ireland. The engine can be used to identify key words spoken in surveillance video content (CCTV) where audio is either not present or of poor quality.
DASA Delivery Manager, Eleanor Humphrey, stated, “Behavioural Analytics is a fascinating and emerging capability that is finding innovative ways to keep our people safe from major threats. We are delighted to be working with Liopa to accelerate their technology and look forward to seeing the results.”
Liam McQuillan, Founder and CEO, Liopa, stated in the release, “This contract allows us to build on the progress made in the Phase 1 project. It’s great validation of our VSR technology in a practical use case that will provide invaluable information for Defence & Security personnel.”
Liopa is not alone in its quest to tap AI for lip-reading. Surveillance company Motorola Solutions has a patent for a lip-reading system designed to aid police. Skylark Labs, a startup whose founder has ties to the US Defense Advanced Research Projects Agency (DARPA), told Motherboard that its lip-reading system is currently deployed in private homes and a state-controlled power company in India to detect foul and abusive language.
VSR Tech Could Be Ensnared in Ethical Issues Akin to Facial Recognition
Some see the sticky wicket ahead similar to what has befallen the facial recognition market, which has been ensnared in ethical issues.
“This is one of those areas, from my perspective, which is a good example of ‘just because we can do it, doesn’t mean we should,’” stated Fraser Sampson, the UK’s biometrics and surveillance camera commissioner, to Motherboard. “My principal concern in this area wouldn’t necessarily be what the technology could do and what it couldn’t do, it would be the chilling effect of people believing it could do what it says. If that then deterred them from speaking in public, then we’re in a much bigger area than simply privacy, and privacy is big enough.”
AI researchers are now more cognizant of the ethical implications of how AI is applied. For example, the NeurIPS conference now requires AI scientists to submit, along with their proposed papers, impact statements about how their findings might affect society.
Stavros Petridis, who has conducted related research at Imperial College London and is now working for Facebook, spoke to Motherboard about the dilemma. “In the last year there have been several discussions in the published literature around ethical considerations for VSR technology,” he stated. “Given that there are no commercial applications available yet, there are pretty good chances that this time, ethical considerations will be taken into account before this technology is fully commercialized.”
Liopa CEO Liam McQuillan also spoke to Motherboard about the issue, saying the company is at least a year away from having a system that can lip-read keywords from silent CCTV footage at the required level of accuracy. He said the company has considered the possibility of a privacy backlash. “There may be concerns here that actually forbid the ultimate use of this technology,” McQuillan stated.
At the Consumer Electronics Show in January, Sony provided an overview of its Visual Speech Enablement product in development, that uses camera sensor and AI for augmented lip reading. Mark Hanson, Sony’s VP of Product Technology and Innovation, said the product isolates a user’s lips and translates their movements into words, independent of background or foreground noise, according to an account in PCMag.
The new product’s technology only captures lips, not faces, so no user-identifiable data is retained, Hanson indicated.