Amidst narratives of machine learning complacency, Apple is coming to terms with the fact that not talking about innovation means innovation never happened.
A detailed blog posting in the company’s machine learning journal makes public the technical effort that went into its “Hey Siri” feature — a capability so banal that I’d almost believe Apple was trying to make a point with highbrow mockery.
Even so, it’s worth taking the opportunity to explore exactly how much effort goes into the features that do, for one reason or another, go unnoticed. Here’s five things that make the “Hey Siri” functionality (and competing offerings from other companies) harder to implement than you’d imagine and commentary on how Apple managed to overcome the obstacles.
It had to not drain on your battery and processor all day
At its core, the “Hey Siri” functionality is really just a detector. The detector is listening for the phrase, ideally using fewer resources than the entirety of server-based Siri. Still, it wouldn’t make a lot of sense for this detector to even just suck on a device’s main processor all day.
Fortunately, the iPhone has a smaller “Always On Processor” that be used to run detectors. At this point in time, it wouldn’t be feasible to smash an entire deep neural network (DNN) onto such a small processor. So instead, Apple runs a tiny version of its DNN for recognizing “Hey Siri.”
When that model is confident it has heard something resembling the phrase, it calls in backup and has the signal captured analyzed by a full size neural network. All of this happens in a split second such that you wouldn’t even notice it.
All languages and ways of pronouncing “Hey Siri” had to be accommodated
Deep learning models are hungry and suffer from what’s called the cold start problem — the period of time where a model just hasn’t been trained on enough edge cases to be effective. To overcome this, Apple got crafty and pulled audio of users saying “Hey Siri” naturally and without prompting, before the Siri wake feature even existed. Yeah I’m with you, this is weird that people would attempt to have real conversations with Siri but crafty nonetheless.
These utterances were transcribed, spot checked by Apple employees and combined with general speech data. The aim was to create a model robust enough that it could handle the wide range of ways in which people say “Hey Siri” around the world.
Read the source article at TechCrunch.