By John P. Desmond, AI Trends Editor
With AI systems today determining whether someone can get a job or a loan, it’s in the interest of the company running the AI system to make sure the underlying dataset is not so biased that it leads to errors in its conclusions.
Cases of biased data leading to biased results have been documented, such as in the research of Joy Buolamwini and Timnit Gebru, authors of a 2018 study that showed facial-recognition algorithms were very good at identifying white males, but recognized Black females only two thirds of the time. If law enforcement is using such a system to identify suspects, that can lead to some serious problems.
The stage is set for serious effort to go into reducing biased datasets on which AI systems rely. “It’s an opportunity,” stated Alexandra Ebert, chief trust officer at Mostly AI, a startup focused on synthetic data based in Vienna, quoted in a recent account in IEEE Spectrum. Businesses, data scientists, and engineers are beginning to focus on how to remove bias from AI datasets and algorithms, for the betterment of society.
Training datasets may come up short in data from minority groups and reflect historical inequities such as lower salaries for women or racial bias, such as when Asian-Americans are labelled foreigners. Models that learn from biased training data will exhibit the same biases. To collect high quality data that is balanced and inclusive can cost some money.
That’s where suppliers of synthetic data such as Mostly AI see an opportunity. They can, for example, create a person that may have never existed but who fits in with the pattern of existing data showing for example, race, income, education background. The new individual would “behave like a female with higher income would behave, so that all the data points from the person match up and make sense,” Ebert stated. The synthetic data may slightly sacrifice some accuracy, but it is still statistically highly representative.
Another synthetic data startup is Synthesized, based in London, whose founders were machine learning researchers at the University of Cambridge. The company is focused on serving data scientists. Mostly AI and several other firms are working toward the launch of an IEEE standards group on synthetic data, Ebert stated.
Toolkits, Frameworks Emerging to Help Reduce Bias in Datasets
Developers are creating tools to help reduce bias in AI. These include tools from Aequitas to measure bias in uploaded data sets, and from Themis–ml that put datasets through bias-mitigation algorithms
A team at IBM has assembled a comprehensive open-source toolkit called AI Fairness 360, which helps detect and reduce unwanted bias in datasets and machine-learning models. It assembles14 different bias-mitigation algorithms developed by computer scientists over the past decade, and is aimed at being intuitive to use. “The idea is to have a common interface to make these tools available to working professionals,” stated Kush Varshney, a research manager at IBM Research AI in Yorktown Heights, New York, leader of the project, to IEEE Spectrum.
The tools implement different techniques to massage the data. Reweighing, for example, gives higher weight to input/output pairs that give the underprivileged group a more positive outcome. Some work on tweaking machine learning algorithms, such as to optimize for the group A or B that has less data, to prod the model to a more fair outcome across groups.
At the root of fairness in AI is the dataset. “We can’t say a priori that this algorithm will work best for your fairness problem or dataset,” stated Varshney. “You have to figure out which algorithm is best for your data.” He has seen developers learn to use the bias-reducing toolkit. “There’s some nuance to it, but once you make up your mind to mitigate bias, yes you can do it,” he stated.
Checking on Whether Developer Worldviews Are Influencing Datasets
AI engineering managers need to be aware of whether their AI engineers are passing their own biases onto the systems they develop. “The success of any AI application is intrinsically tied to its training data,” stated Shomron Jacob, engineering manager for application machine learning and platform at Iterate.ai, in a recent account in VentureBeat. Iterate.ai is a startup based in San Jose building an AI platform that in part helps startups participate in large enterprises.
“If engineers allow their own worldviews and assumptions to influence datasets—perhaps supplying data that is limited to only certain demographics or focal points—applications dependent on AI problem-solving will be similarly biased, inaccurate, and, well, not all that useful,” Jacob stated. “I expect bias scrutiny is only going to increase as AI continues its rapid transition from a relatively nascent technology into an utterly ubiquitous one. But human bias must be overridden to truly achieve that reality.”
AI development organizations need to employ effective frameworks, toolkits, processes and policies for recognizing and mitigating AI bias. Available open source tools can be of assistance in finding blind spots in data.
AI Frameworks are designed to protect organizations from the risks of AI bias by introducing checks and balances. Benchmarks for trusted, bias-free practices can be automated and ingrained into products using these frameworks, Jacob advised.
He suggested these example AI frameworks:
The Aletheia Framework from Rolls Royce provides a 32-step process for designing accurate and carefully managed AI applications;
Deloitte’s AI framework highlights six essential dimensions for implementing AI safeguards and ethical practices;
And a framework from Naveen Joshi details cornerstone practices for developing trustworthy AI. It focuses on the need for explainability, machine learning integrity, conscious development, reproducibility, and smart regulations.
And Jacob suggested these example AI toolkits, including the AI Fairness 360 previously mentioned:
IBM Watson OpenScale provides real-time bias detection and mitigation and enables detailed explainability to help make AI predictions trusted and transparent;
Google’s What-If Tool offers visualization of machine learning model behavior, making it easier to test trained models against machine learning fairness metrics to root out bias.
One Team Practices Community-Based System Dynamics
One AI engineer values an approach that combines many stakeholders in the initial definition of an AI project. The team needs to take into account the social implications of its implementation, suggests Damian Scalerandi, VP of operations at BairesDev, author of a recent account in Forbes. The San Francisco-based BairesDev offers AI software development services to its clients.
AI development is likely to have its blind spots. “And our best chance to find them and patch them is to collaborate with the people closest to the societal context itself—sociologists, behavioral scientists and humanities specialists,” Scalerandi stated.
Some engineers refer to this approach as community-based system dynamics (CBSD), a term introduced in 2013 in a book by that name by author Peter S. Hovmand.
“Together, we can form a shared hypothesis of how a certain algorithm could work and how we can best guarantee win-win scenarios,” Scalerandi stated. “In the end, this is all about supporting technological innovations that are fair, safe, and beneficial to everyone.”