Facial recognition is being developed constantly and it is becoming integrated into many commonly used devices, like cell phones. But these technologies are far from perfect. One of the most recent issues was found by researchers from MIT and Stanford University. The research team found that three facial-analysis programs from three major technology companies have skin-type and gender biases.
The researchers conducted various experiments that tested the error rates of these three facial analysis programs. According to the experiments, the programs were able to determine the gender of light-skinned men with only a 0.8 percent error rate. But when analyzing darker-skinned women’s faces there was a more than 20 percent error rate for one program and a more than 34 percent error rate for the other two programs. This is a huge error in these programs that may completely alienate many users.
After these experiments, researchers started questioning all of the current neural networks. Neural networks learn to perform tasks and are trained by looking for patterns in data. The researchers found that one major U.S. tech company has facial recognition software that it claims is 97 percent accurate. But the data set that this company used to train its software was more than 77 percent male and more than 83 percent white. This is a major problem for most potential users out there. By omitting female and non-white faces from the data set used to train its software, the company is allowing the majority of people using the software to be vulnerable to malfunctioning software that doesn’t recognize their faces.
"What's really important here is the method and how that method applies to other applications," says Joy Buolamwini, a researcher in the MIT Media Lab's Civic Media group and first author on the new paper. "The same data-centric techniques that can be used to try to determine somebody's gender are also used to identify a person when you're looking for a criminal suspect or to unlock your phone. And it's not just about computer vision. I'm really hopeful that this will spur more work into looking at [other] disparities."
The three systems that were used in the study were general-purpose facial analysis systems. They are used to match faces in photos and analyze gender, age and mood in photos. The researchers found that the three systems used binary decision to recognize gender, so this is a more simple task for the systems. But, because the systems’ data set was not evenly filled with men and women, it is harder for the system to recognize female faces.
This discovery jump-started Buolamwini to search further into it a few years ago in the MIT Media Lab. She was developing a system named Upbeat Walls which was an interactive art installation that allowed visitors to control patterns that were projected on a surface by moving heads. She used a commercial facial analysis program to track the faces. During development, the ethnically diverse team found that they had to use lighter-skinned researchers to demonstrate the wall during their presentation. Their system didn’t work with darker-skinned people. Buolamwini, who is black, decided to look into this. She submitted photos of her face to various facial recognition systems. She found that the programs often didn’t recognize her face as a human face at all, and when it did it said she was male.
Inspired by these initial findings, Buolamwini gathers a data set of faces with women and people with dark skin to give a facial recognition system a much more diverse set to analyze and learn from. The final data set she used had more than 1,200 faces. She worked with a surgeon who specializes in skin to code the images on a range of skin tones, calls the Fitzpatrick scale of skin tones. This is a six-point scale ranging from the lightest to darkest. It was originally made for dermatologists to assess sunburns. Buolamwini used these three commercial facial-analysis systems to her new data set.
When these systems analyzed faces from her data set, the results showed that the error rate for gender classification was consistently higher for females than males, and higher for darker skinned people than lighter-skinned people. The mistakes were very bad for darker skinned women with a 20.8 percent, 34.5 percent and 34.7 percent error rate across the three systems. The results were the worst for women with skin tones that were the darkest on the Fitzpatrick scale. The systems had 46.5 percent and 46.8 percent error rates. At that point, the system is just guessing the person’s gender.
These findings prove that the systems are basically alienating many of their potential users. People from these companies need to take these findings and provide their systems with better data sets so they can provide accurate readings for everyone, not just white males.
"To fail on one in three, in a commercial system, on something that's been reduced to a binary classification task, you have to ask, would that have been permitted if those failure rates were in a different subgroup?" Buolamwini says. "The other big lesson ... is that our benchmarks, the standards by which we measure success, themselves can give us a false sense of progress."
To read more on this research, visit the MIT News site here.