Joining the battle against health care bias
Leo Celi

Applying machine learning and other advanced data science techniques to medical data reveals that “bias exists in the data in unimaginable ways,” in every type of health product, says Leo Anthony Celi.

Credits:

Photo: David Sella/MIT Corporate Relations

Leo Anthony Celi invites industry to broaden its focus in gathering and analyzing clinical data for every population.

Eric Bender | MIT Industrial Liaison Program

Medical researchers are awash in a tsunami of clinical data. But we need major changes in how we gather, share, and apply this data to bring its benefits to all, says Leo Anthony Celi, principal research scientist at the MIT Laboratory for Computational Physiology (LCP), and at the Institute for Medical Engineering and Science (IMES).

One key change is to make clinical data of all kinds openly available, with the proper privacy safeguards, says Celi, a practicing intensive care unit (ICU) physician at the Beth Israel Deaconess Medical Center (BIDMC) in Boston. Another key is to fully exploit these open data with multidisciplinary collaborations among clinicians, academic investigators, and industry. A third key is to focus on the varying needs of populations across every country, and to empower the experts there to drive advances in treatment, says Celi, who is also an associate professor at Harvard Medical School. 

In all of this work, researchers must actively seek to overcome the perennial problem of bias in understanding and applying medical knowledge. This deeply damaging problem is only heightened with the massive onslaught of machine learning and other artificial intelligence technologies. “Computers will pick up all our unconscious, implicit biases when we make decisions,” Celi warns.

Sharing medical data 

Founded by the LCP, the MIT Critical Data consortium builds communities across disciplines to leverage the data that are routinely collected in the process of ICU care to understand health and disease better. “We connect people and align incentives,” Celi says. “In order to advance, hospitals need to work with universities, who need to work with industry partners, who need access to clinicians and data.” 

The consortium's flagship project is the MIMIC (Medical Information Mart for Intensive Care) database built at BIDMC. With about 35,000 users around the world, the MIMIC cohort is the most widely analyzed in critical care medicine. 

International collaborations such as MIMIC highlight one of the biggest obstacles in health care: most clinical research is performed in rich countries, typically with most clinical trial participants being white males. “The findings of these trials are translated into treatment recommendations for every patient around the world,” says Celi. “We think that this is a major contributor to the sub-optimal outcomes that we see in the treatment of all sorts of diseases in Africa, in Asia, in Latin America.” 

To fix this problem, “groups who are disproportionately burdened by disease should be setting the research agenda,” Celi says. 

That's the rule in the “datathons” (health hackathons) that MIT Critical Data has organized in more than two dozen countries, which apply the latest data science techniques to real-world health data. At the datathons, MIT students and faculty both learn from local experts and share their own skill sets. Many of these several-day events are sponsored by the MIT Industrial Liaison Program, the MIT International Science and Technology Initiatives program, or the MIT Sloan Latin America Office. 

Datathons are typically held in that country's national language or dialect, rather than English, with representation from academia, industry, government, and other stakeholders. Doctors, nurses, pharmacists, and social workers join up with computer science, engineering, and humanities students to brainstorm and analyze potential solutions. “They need each other's expertise to fully leverage and discover and validate the knowledge that is encrypted in the data, and that will be translated into the way they deliver care,” says Celi. 

“Everywhere we go, there is incredible talent that is completely capable of designing solutions to their health-care problems,” he emphasizes. The datathons aim to further empower the professionals and students in the host countries to drive medical research, innovation, and entrepreneurship.

Fighting built-in bias 

Applying machine learning and other advanced data science techniques to medical data reveals that “bias exists in the data in unimaginable ways” in every type of health product, Celi says. Often this bias is rooted in the clinical trials required to approve medical devices and therapies. 

One dramatic example comes from pulse oximeters, which provide readouts on oxygen levels in a patient's blood. It turns out that these devices overestimate oxygen levels for people of color. “We have been under-treating individuals of color because the nurses and the doctors have been falsely assured that their patients have adequate oxygenation,” he says. “We think that we have harmed, if not killed, a lot of individuals in the past, especially during Covid, as a result of a technology that was not designed with inclusive test subjects.” 

Such dangers only increase as the universe of medical data expands. “The data that we have available now for research is maybe two or three levels of magnitude more than what we had even 10 years ago,” Celi says. MIMIC, for example, now includes terabytes of X-ray, echocardiogram, and electrocardiogram data, all linked with related health records. Such enormous sets of data allow investigators to detect health patterns that were previously invisible. 

“But there is a caveat,” Celi says. “It is trivial for computers to learn sensitive attributes that are not very obvious to human experts.” In a study released last year, for instance, he and his colleagues showed that algorithms can tell if a chest X-ray image belongs to a white patient or person of color, even without looking at any other clinical data. 

“More concerningly, groups including ours have demonstrated that computers can learn easily if you're rich or poor, just from your imaging alone,” Celi says. “We were able to train a computer to predict if you are on Medicaid, or if you have private insurance, if you feed them with chest X-rays without any abnormality. So again, computers are catching features that are not visible to the human eye.” And these features may lead algorithms to advise against therapies for people who are Black or poor, he says. 

Opening up industry opportunities 

Every stakeholder stands to benefit when pharmaceutical firms and other health-care corporations better understand societal needs and can target their treatments appropriately, Celi says. 

“We need to bring to the table the vendors of electronic health records and the medical device manufacturers, as well as the pharmaceutical companies,” he explains. “They need to be more aware of the disparities in the way that they perform their research. They need to have more investigators representing underrepresented groups of people, to provide that lens to come up with better designs of health products.” 

Corporations could benefit by sharing results from their clinical trials, and could immediately see these potential benefits by participating in datathons, Celi says. “They could really witness the magic that happens when that data is curated and analyzed by students and clinicians with different backgrounds from different countries. So we're calling out our partners in the pharmaceutical industry to organize these events with us!” 

* Originally published in MIT News.