Data for all

To positively impact global health, build an accessible multinational community around data. That’s what Dr. Roger Mark, Distinguished Professor of Health Sciences and Technology and of Electrical Engineering and Computer Science, and his colleagues at the Laboratory for Computational Physiology (LCP) believe. For five decades, Mark has promoted data-sharing initiatives.

“There are lots of smart, creative people, but getting access to clinical and physiological data is extremely difficult,” says Mark. “Data should be available for use by essentially the entire world community of research people.”

PhysioNet and MIMIC III

PhysioNet (a core laboratory of the NIH-funded Research Resource for Complex Physiologic Signals) was launched online in 1999 by Mark and Beth Israel Deaconess Medical Center (BIDMC) cardiologist Ary Goldberger to provide digital access to a vast library of physiologic signal collections and related open-source software. It’s a descendant of the MIT-BIH Arrhythmia Database, which Mark developed in the 1970s as he worked on early microcomputer-based cardiac arrhythmia monitors. Over the years, MIT researchers and myriad others—from Canada, England, Germany, Slovenia, Spain, and Taiwan, to name a few nations—have contributed major additional data collections into PhysioNet whose archives now exceed 4 TB.

“There are more than 100 papers published per month that make use of PhysioNet data,” says Mark. “It pleases me greatly. Smart, creative people just need these raw materials.”

In 2009, Mark and his colleagues added the relational database MIMIC (Medical Information Mart for Intensive Care)—now titled MIMIC III, its third iteration. MIMIC III includes comprehensive de-identified BIDMC ICU patient data (i.e., waveforms, medications, laboratory tests, physician/nurse notes, discharge summaries, imaging reports, etc.) from more than 40,000 patients over 10 years. The more than 3,000 academics, clinicians, and industry people using MIMIC III must qualify through human studies coursework and a data use agreement.

Worldwide outreach

Mark is eager to involve clinicians and health experts in the data science mindset. His colleague LCP Clinical Research Director Leo Anthony Celi, also an M.D. at BIDMC, founded Sana at MIT, a global health educational resource and mobile health innovator. Mark refers to Celi as a “kind of Pied Piper. He energizes computer scientists and clinicians to merge their expertise in exploring clinical data.”

“We need to change people’s attitude, not just towards the data, but toward problem-solving,” says Celi. “In the last few years, we’ve organized datathons and events, teaming people in medicine, nursing, and pharmacy, with those in machine learning, signal processing, statistics, and epidemiology. We’re heavily invested in promoting this cultural transformation—to leverage the value of all this data.”

Mark and Celi are also democratizing access to expertise and high-quality healthcare through open-source technologies and a global network of multidisciplinary experts.

“Just pushing technology in low and middle income countries is our anathema,” says Celi. His and Mark’s goal is to develop partnerships with geniuses on the ground, those who understand their locality’s problems and information gaps and are more likely to create sustainable, scalable solutions.

“It can be a challenge getting clinicians excited about data science,” says Mark. “But collaborating with people in other fields is exciting. You constantly learn from them. And this is one of the major goals of IMES.”

Recently, LCP health data science events were held in Argentina, Thailand, Australia, and Brazil. Coming up in 2017:  Mexico, Singapore, Philippines, Colombia, Taiwan, China, and Spain. Experts in other countries become ambassadors of the cultural shift toward free exchange of data.

“I look at these datathons like little spark plugs to generate interest in each hospital community to be part of a wider group — to share their data resources to create an open, multi-institutional and multi-national critical care research resource,” says Mark.

Data-driven future

Many doctors need reassurance that machines are not meant to replace them, Mark says.

“I have an immunity reaction to the idea that solving issues in healthcare is a technological problem. It isn’t. It’s a complex human systems problem,” says Mark. “Technology energizes the imagination of healthcare providers to figure out better ways of working together; that’s good. To think a robot will take care of you when you’re 85 years old, that’s crazy. You’ve got to have people in the loop.”

What machines can do better, he says, is synthesize vast quantities of data through the night, giving doctors and nurses better context to the uncertainties their patients face. He wants to see intelligent patient monitoring track and predict a patient’s pathophysiological state. For instance, knowing the data markers for a patient with kidney failure in regaining kidney function may change a doctor’s medical approach. Further, he hopes new systems can customize analysis, so doctors in Africa, for example, can treat a disease like diabetes without simply adopting U.S. guidelines.

“Many medical problems have a strong socio-political context,” says Celi, who is engaged in Hacking Discrimination1 at MIT. “We have to involve social scientists and anthropologists.”

Mark, Celi and the LCP group have a multi-pronged effort to promote learning. They offer three MIT courses: 6.022 (Quantitative Systems Physiology), HST.936/HST.936x on edX (Global Health Informatics to Improve Quality of Care), and HST.953 (Collaborative Data Science in Medicine). Recently, they published two textbooks: Global Health Informatics (MIT Press) and Secondary Analysis of Electronic Health Records (Springer) — both open-access digitally. And they track their hackathon and datathon participants over time to see the impact these events have on careers and education.

Progressive transparency

Ultimately, Mark believes that transparency in data is key to advancement. With LCP, he wants to address the elephant-in-the-room topic of irreproducibility of science.

“We can address irreproducibility by being more and more transparent with data,” he says, pointing to other initiatives like the Center for Open Science and F1000Research. “We’re not just interested in publishing. We’re interested in the truth.”


1“Hacking Discrimination,” April 28-29, 2017, was organized by the Black Alumni Association of MIT and the MIT Alumni Association to address, among other topics, issues of health disparities, racial inequalities, gender gaps, etc.