Researchers Reduce Bias in aI Models while Maintaining Or Improving Accuracy - kro

dante38q933433/kro

Machine-learning models can fail when they try to make forecasts for people who were underrepresented in the datasets they were trained on.

For example, a design that predicts the very best treatment option for somebody with a persistent illness might be trained utilizing a dataset that contains mainly male patients. That model might make incorrect predictions for female clients when deployed in a health center.

To enhance outcomes, engineers can attempt balancing the training dataset by getting rid of data points till all subgroups are represented similarly. While dataset balancing is promising, it typically needs removing large amount of information, harming the design's overall performance.

MIT scientists developed a new method that identifies and removes specific points in a training dataset that contribute most to a model's failures on minority subgroups. By getting rid of far fewer datapoints than other approaches, this method maintains the general precision of the design while enhancing its performance relating to underrepresented groups.

In addition, the technique can determine covert sources of predisposition in a training dataset that does not have labels. Unlabeled data are even more common than labeled data for numerous applications.

This method might also be combined with other techniques to enhance the fairness of machine-learning designs released in high-stakes scenarios. For example, it may at some point help make sure underrepresented patients aren't misdiagnosed due to a prejudiced AI model.

"Many other algorithms that attempt to address this issue assume each datapoint matters as much as every other datapoint. In this paper, we are showing that assumption is not true. There are specific points in our dataset that are adding to this bias, and we can discover those information points, remove them, and get much better efficiency," states Kimia Hamidieh, an electrical engineering and computer system science (EECS) graduate trainee at MIT and co-lead author of a paper on this method.

She wrote the paper with co-lead authors Saachi Jain PhD '24 and fellow EECS graduate trainee Kristian Georgiev