A few months back, I found myself in one of those big electronics stores that are rapidly becoming extinct. I don’t remember exactly why I was there, but I remember a very specific moment where a woman with a Chinese accent paced up to the cashier, plopped her home voice assistant down on the counter and proclaimed: “I need to return this…thing.” The clerk nodded and asked why. “Because this damn thing doesn’t understand anything I say!”
You’d be surprised how often voice recognition systems have this problem. And why is that, exactly? It’s because they’re trained on the same kind of voices, namely engineers in the valley. That means that when an assistant hears a request from someone in a thick Southern drawl, a wicked hard Boston accent, or a Cajun N’awlins dialect, you name it, they simply won’t know what to make of those commands.
Now, while a voice assistant understanding your every request isn’t exactly a life or death problem, it’s evidence of something called algorithmic bias. And whether or not algorithmic bias is real is no longer up for debate. There are myriad examples, from ad networks showing high-paying jobs to men far more often than women and models that trumpet 1950 gender bias to bogus sentencing based on classifiers to AI-judged beauty pageants that demonstrate preference for white contestants. Hardly a month goes by where another, high-profile instance isn’t plastered across tech news.
Again, the question isn’t whether the problem exists; it’s how we solve the problem. Because it’s one we absolutely have to solve. Algorithms already control much more of our lives than most people realize. They’re responsible for whether you can get a mortgage, how much your insurance costs, what news you see, really, essentially, everything you do and see online–and increasingly offline as well. None of us should want to live in a society where these biases are codified and amplified.
So how do we solve this problem? First, the good news: almost none of the prominent examples of algorithmic bias aren’t due to malicious intent. In other words, it isn’t as if there’s a room full of sexist, racist programmers foisting these models on the public. It’s accidental, not purposeful. The bias, in fact, comes from the data itself. And that means we can often solve it with different–or more–data.
Take the example of a facial classifier that didn’t recognize black people as people. This is a glaring example of bias but it stems primarily from the original dataset. Namely, if an algorithm is trained on a set of white college students, it may have significant problems recognizing people with darker complexions, older people, or babies. They will be ignored. Fixing that means training that algorithm on an additional corpus of facial data, like the project the folks at Kiva are undertaking.
Kiva is a microlending platform focused predominantly on the developing world. As part of their application process, they ask prospective borrowers to include a photo of themselves, along with the other pertinent details to share with community of lenders. In doing so, Kiva has accrued a dataset of hundreds of thousands of highly diverse images, importantly captured in non-laboratory, real-world, settings. If you take that original, biased facial classifier and retrain it with additional, labeled images from a dataset that is more representative of the full spectrum of human faces, suddenly, you have a model that recognizes a much wider population.
Most instances of algorithmic bias can be solved in the same way: retraining a classifier with tailored data. Those voice assistants that don’t understand accents? Once they hear enough of those accent, they will. The same is true with essentially every example I cited above. But this does beg a different question: if we know how to fix algorithmic bias, why are there so many instances of it?
This is where companies need to step up. Because the instances of bias we mentioned above really should have been caught. Think about it: why didn’t anyone consider making sure their facial recognizer could understand non-white faces? Odds are, they didn’t consider it at all. Or if they did, maybe they checked in stale, laboratory conditions, that don’t mirror the real world.
Here, companies need to consider two things. First, hiring. Diverse engineering teams ask the right questions. And by most measures, diverse teams perform better because they bring different experiences to their work. Second, companies aren’t thinking enough about their users. Or, to put it more directly, they aren’t thinking enough about their universe of potential users. Diverse teams, inherently, will help with this problem, but even then you can run into problems. Take a moment before you release an machine learning project and stress-test it in ways your team didn’t think off straight off the bat. Use empathy. Realize that different users will act different ways and that, although you can’t reasonably hope to foresee them all, by making a concerted effort, you can catch a great deal of these problems before your project goes live.
At this point, most of us are aware that artificial intelligence is going to transform business and society. It’s a definite at this point, though experts can quibble on the extent. We also know that AI can both amplify existing bias and even evidence bias where none was intended. But it’s solvable. It is. It’s just a matter of being conscientious. It means hiring smartly. It means testing smartly. And it means, above all, using the same data that makes AI work to make AI work more fairly. Algorithmic bias is pervasive, but it’s not intractable. We just need to admit it exists and take the smart steps to fix it.
For more information, go to Appen.