AI-driven platform identifies and remediates biases in data
November 11, 2020 at 9.00am
The first publicly available solution to accurately detect and remove biases in data launches today. Synthesized is releasing the Community Edition of its data platform for Bias Mitigation. Released as a freemium version, the offering incorporates AI research and cutting-edge techniques to enable any organisation to quickly identify potential biases within their data and immediately start to remediate these flaws.
The platform was designed by the London-based firm to understand a wide array of regulatory and legal definitions regarding contextual bias. It can automatically identify bias across data attributes like gender, age, race, religion, sexual orientation, and more.
Synthesized is making the capability available immediately, requiring no coding or deep technical expertise to get started. Users simply upload a structured data file, like a spreadsheet, to kick off the analysis process. The inherent simplicity of the platform allows for the solution to span industries. The data platform could be used in finance to create fairer credit ratings, in insurance to better assess claims more equitably, in human resources to identify bias as part of a hiring process and in universities to ensure that admission decisions are fair.
Dr Nicolai Baldin, CEO and founder of Synthesized said, “The reputational risk of all organisations is under threat due to biased data and we’ve seen this will no longer be tolerated at any level. It’s a burning priority now and must be dealt with as a matter of urgency, both from a legal and ethical standpoint. Synthesized’s Community Edition for Bias Mitigation is one of the first offerings specifically created to understand, investigate, and root out bias in data. We designed the platform to be very accessible, easy-to-use and highly scalable, as organisations have data stored across a huge range of databases and data silos.”
Rebalancing Biased Data
Beyond this deep analysis and bias detection, the platform also offers another extremely powerful feature: to automatically remove the biases present in an entire dataset in a process called rebalancing.
While there are a number of existing, limited techniques to rebalance biased data, Synthesized has developed a proprietary algorithm within its platform that is quicker and more accurate. The AI-driven platform has the ability to make randomised changes, at scale, to an original, biased dataset to construct a new, entirely synthetic dataset. With the generation of synthetic data, Synthesized’s platform gives its users the ability to equally distribute all attributes within a dataset to remove bias and rebalance the dataset completely. Users can also manually change singular data attributes within a dataset, such as gender, providing granular control of the rebalancing process.
Community Edition for Bias Mitigation - How It Works
- Free sign up: Your organisation can sign up here.
- Easy to get started: Upload a structured data file, like an Excel spreadsheet, to kick off the analysis process. Users can also connect to relational database services including AWS, Azure, Google Cloud, Oracle, and others, to build custom datasets for analysis. The platform learns the structure of the data in real-time, and the analysis process can crunch over four million rows of data in roughly ten minutes.
- Bias summary and score: Once the analysis is complete, users are provided with a Synthesized Total Fairness Score that shows what percentage of the dataset contained biased data. The platform also highlights areas of the data in which bias was detected.
- Rebalancing: As mentioned, the final feature available in this process is the ability to automatically rebalance biased data.
Synthesized’s Complete Solution
The Community Edition is one part of Synthesized’s data platform. The complete platform uses AI to automate all stages of data provisioning; the process of making data available in an orderly and secure way. This level of automation enables organisations to generate synthesized datasets, allowing them to better test data for new products and tools, validate mathematical models, or train machine learning models.
Synthesized completely removes the heavy and costly burden of finding, collecting, and preparing data. Gartner estimates that data scientists and test engineers currently waste up to 80% of their valuable time on such repetitive tasks. Synthesized’s data platform helps organisations to finally unlock and maximise data’s true value.
The company was founded in 2017 by Dr Nicolai Baldin during his transition from academia to working with public bodies in the UK. While pursuing his PhD in Statistics and Machine Learning at the University of Cambridge, he identified the significant gap in the advancements made by the scientific community and those made by major organisations, and created a platform to bridge this gap.
The new innovation caps off a year of sustained growth for the company. Just two weeks before the UK went into lockdown in March 2020, Synthesized closed a seed funding round of £2.2 million. More recently it collaborated with the Financial Conduct Authority (FCA) to launch a collection of synthetic fraud datasets for secure third-party collaboration in the Digital Sandbox Pilot, jointly launched by the FCA and City of London Corporation.