Millions of dollars are spent on development artificial intelligence software that reads x-rays and other medical images in hopes of spotting things doctors are looking for but sometimes lack, such as lung cancer. A new study reports that they are algorithms I can also see something that doctors don’t look for on such shots: the race of patients.
The study’s authors and other medical artificial intelligence experts say the results make it more important than ever to test whether health algorithms behave fairly in people with different racial identities. Complicates that task: The authors themselves aren’t sure what signs the algorithms they created use to predict someone’s race.
Evidence that algorithms can read races from a person’s medical images came from tests on five types of images used in radiological research, including chest and arm X-rays and mammograms. The images included patients who identified themselves as blacks, whites, and Asians. For each type of scan, the researchers trained algorithms using images marked by the patient’s race that he reported himself. They then challenged algorithms to predict the race of patients on different, unlabeled images.
Radiologists generally do not consider a person’s racial identity – which is not a biological category – to be visible on images looking under the skin. Yet the algorithms have in some ways proved capable of accurately detecting it for all three racial groups, and on different views of the body.
For most types of scans, algorithms were able to correctly identify which of the two images was black more than 90 percent of the time. Even the worst-performing algorithm succeeded 80 percent of the time; the best was 99 percent accurate. The results and related code published late last month by a group of more than 20 researchers with experience in medicine and machine learning, but the study has not yet been reviewed.
The results have raised new concerns that artificial intelligence software may exacerbate health inequalities, where studies show that blacks and other marginalized racial groups often receive inferior care compared to rich or white people.
Machine learning algorithms are set up to read medical images by feeding them many marked examples of conditions such as tumors. By reviewing many examples, algorithms can learn pixel patterns statistically related to these tags, such as the texture or shape of the lung node. Some algorithms have thus made rival physicians in the detection of cancer or skin problems; there is evidence that they can detect signs of disease invisible to human experts.
Judy Gichoya, a radiologist and assistant professor at Emory University who worked on the new study, says their discovery that image algorithms can “see” race in internal scans probably leads them to learn inappropriate associations as well.
Medical data used to train algorithms often bear traces of racial inequalities in disease and treatment, due to historical and socioeconomic factors. This could lead to the algorithm looking for statistical patterns in the scan using its guess in the patient’s race as a kind of shortcut, suggesting diagnoses that correlate with racially biased patterns from his training data, not just visible medical anomalies that radiologists looking for. Such a system can give some patients a wrong diagnosis or falsely make everything clear. The algorithm can suggest different diagnoses for blacks and whites with similar signs of the disease.
“We need to educate people about this problem and explore what we can do to alleviate it,” Gichoya says. Her project collaborators came from institutions including Purdue, MIT, Beth Israel Deaconess Medical Center, Tsing Hua National University in Taiwan, the University of Toronto and Stanford.
Previous studies have shown that medical algorithms have caused bias in the provision of care and that image algorithms may behave unequally for different demographic groups. In 2019, it was discovered that a widespread algorithm was used to prioritize care for the sickest patients lack of blacks. In 2020, researchers from the University of Toronto and MIT showed that algorithms trained to mark conditions such as pneumonia on chest X-rays are sometimes performed differently for people of different genders, ages, races, and types of health insurance.