Tan lab focuses on making machine learning models more fair

Four colorful graphs show the effectiveness of the researchers' techniques when searching for hyperparameters that can result in unfair models over time.

This figure shows the effectiveness of the researchers' techniques when searching for hyperparameters that can result in unfair models over time.

Posted on March 11, 2025

Editor’s Note: A version of this story was originally published to Penn State News.

UNIVERSITY PARK, Pa. — G. Gary Tan, Penn State Institute for Computational and Data Sciences (ICDS) co-hire and professor of computer science and engineering, is working on a three-year, $600,000 project funded by the U.S. National Science Foundation focused on helping researchers customize fair models for their project by mitigating biases in already existing machine learning models in open-source libraries.

Computational models that appear to “think” are trained on very large datasets to learn how to identify and process information. The type of data depends on the goal of the researchers developing the model, but available datasets may raise issues of confidentiality and fairness, according to Tan. The data and how the people involved label it or feed it to the models may be biased, even unconsciously, against specific groups of people. These unbalanced datasets contain what the researchers call “fairness bias,” which could lead to unequal treatment of different groups across various demographic categories by the models.

For example, a model would be considered unfair if it predicts different outcomes for two individuals that have the same features except for a protected attribute, such as a model involved in hiring that tends to recommend more men than women, even though all other attributes are equal.

Researchers applied software testing and fuzzing, a process of generating random inputs such as demographical information to check the fairness of the models like deep neural networks and large language models. The types of inputs vary depending on the training of the model. If a model that needs users’ features to infer their income levels, inputs would include occupation, sex or race. To measure model fairness, the research team used metrics such as equal opportunity difference and average odd difference, which measure the difference of statistics between two protected groups such as how likely a job applicant is hired between a male and a female group.

The research team aims to create fairness customization recommendations for researchers.

“We want to understand what customizations within the models may produce fair or unfair models,” Tan said. “After we test and better understand what customizations can result in a fair model, we can recommend what customizations users should stay away from as they could result in an unfair model.”

Tan presented this work to the ICDS community, which encouraged him to think more broadly about the potential impact of his research.

A lot of the faculty build models from their data and are concerned about fairness. This kind of work can help them navigate the customizable space better,” Tan said.

ICDS News

Tan lab focuses on making machine learning models more fair

Share

Related Posts

News & Events