When we say “Big Data”—what do we mean by big? MIT faculty member Sandy Pentland, who is numbered by Forbes among the world’s seven most powerful data scientists, says we’re not talking about sheer size. Researchers amassed enormous data sets, he points out, long before Big Data coalesced as a capital-letter concept. But, says Pentland, the scope of data at our fingertips “is far more pervasive than ever, far more continuous.” Spectrum asked Pentland and six other MIT researchers working in the sphere of human health to discuss the challenges and opportunities they face in this new landscape.

MARIANA ARCAYA explores relationships between geographic contexts and health. An assistant professor in the Department of Urban Studies and Planning, she is one of the leaders of a new Health Initiative in the School of Architecture and Planning.

EMERY BROWN, an anesthesiologist at Massachusetts General Hospital, leads the Neuroscience Statistics Research Laboratory in the Department of Brain and Cognitive Sciences. He is MIT’s Edward Hood Taplin Professor of Medical Engineering and of Computational Neuroscience, director of the Harvard-MIT Health Sciences and Technology Program, associate director of the Institute for Medical Engineering and Science (IMES), and an investigator at MIT’s Picower Center for Learning and Memory.

MUNTHER DAHLEH is the director of MIT’s Institute for Data, Systems, and Society (IDSS), which has just celebrated its first anniversary, and the William A. Coolidge Professor in the Department of Electrical Engineering and Computer Science.

JOSEPH DOYLE is the Erwin H. Schell Associate Professor of Management in applied economics. He is a co-director of the MIT Sloan Initiative for Health Systems Innovation and a co-chair of the Health Sector of the Abdul Latif Jameel Poverty Action Lab (J-PAL), a research center at MIT dedicated to reducing poverty by ensuring that policy is informed by scientific evidence.

AMY FINKELSTEIN PhD ’01 is the John and Jennie S. MacDonald Professor of Economics at MIT, the co-scientific director of J-PAL North America, and chair of J-PAL’s US Health Care Delivery Initiative, which supports randomized evaluations of innovative strategies to improve the efficiency and effectiveness of health care in the United States.

JOHN GUTTAG is the Dugald C. Jackson Professor in the MIT Department of Electrical Engineering and Computer Science. He leads the Data-Driven Inference Group at the Computer Science and Artificial Intelligence Laboratory (CSAIL).

ALEX “SANDY” PENTLAND, Toshiba Professor at MIT, is a computer scientist and entrepreneur who co-created the MIT Media Lab, where he now directs the Human Dynamics research group. He also leads the Connection Science initiative, based in IDSS.

Blending perspectives

DAHLEH: “By hiring new faculty and bringing together the larger community of researchers at MIT who focus on data, IDSS aims to promote the development of data science theoretically, algorithmically, and in the various ways it is applied to real challenges. IDSS faculty collaborate on health care research with major departments and institutes at MIT, including the Broad Institute, the Koch Institute for Integrated Cancer Research, and the Institute for Medical Engineering and Science (IMES). This kind of research follows the general theme of ‘data to models to decisions/ insights.’ Our faculty, in collaboration with experts in their fields, brings in new powerful methods in statistics and machine learning, graphical models and causal inference, dynamic system theory, optimization, and decision theory to address difficult challenges in health care. Our research addresses ‘big data’ questions (this includes general electronic hospital data sets or genomic data), ‘small data’ questions (such as those related to clinical trials), learning model structures of underlying phenomenon (e.g., learning gene regulatory networks), dynamic effects (e.g., contagion), and control and decision effects (e.g., targeting immunization, pandemics), as well as policy questions like the effects of funding on cancer research.”

GUTTAG: “We can sit in our lab at MIT and do fancy mathematics, but if we actually want to have an impact on health, it’s crucial that we work with people in the clinical environment. A lot of what we do in computer science is based upon statistical models, but clinicians have a probably healthy skepticism about statistical models that have no obvious relation to what they observe in practice. I don’t let a week go by where I’m not in conversation with people who are actually on the front line.”

FINKELSTEIN: “J-PAL’s US Health Care Delivery Initiative (HCDI) hosted a ‘matchmaking’ conference in October 2014 that brought together more than 70 health care practitioners, policy makers, and scholars to spur more data-based evaluations of programs and policies in health care delivery. We’re looking to hold another conference this November, with a portion open to the public. We also convene smaller gatherings around specific, high-priority topics. In May, we brought together policy makers from four states and from the White House’s Office of National Drug Control Policy (ONDCP) with medical experts and researchers from J-PAL’s network to discuss opportunities to evaluate innovative approaches to drug abuse and addiction. The roundtable has already spurred a project with the ONDCP, a potential academic research collaboration in two states, and has encouraged one state to make evidence-based adjustments to its opioid prescription policies.”

ARCAYA: “My expectation in advancing the School of Architecture and Planning’s Health Initiative [with urban studies associate professor J. Phillip Thompson] was that a lot of the work going on at SA+P—housing, economic development, environmental work—would be very important for health, even if it’s not labeled that way. For example, Phil and I have a project in the Bronx with Montefiore Hospital, New York City’s Department of Health, and several community-based organizations, aimed at reducing hospitalizations for asthma. As a team, we’re saying: let’s look at the data on which buildings are sending residents to the hospital for asthma in high numbers. Then, let’s go into those buildings and talk to the residents, owners, and landlords. Can we retrofit the buildings for energy conservation and lower utility bills, while improving indoor air quality? And can we do it in a way that’s providing work for local low-income people? We think about a place-based, higher-level intervention that will help reduce not just health care utilization for asthma but at the same time provide some kind of economic, social, and environmental benefit.”

Navigating challenges

BROWN: “Recent technological and experimental advances in the capabilities to record signals from neural systems have led to an unprecedented increase in the types and volume of data collected in neuroscience experiments and hence, in the need for appropriate techniques to analyze them. My laboratory develops algorithms to characterize how the brain represents and transmits information. A primary focus of our research is the development of algorithms to characterize, monitor, and control the brain states of patients receiving general anesthesia and sedation.”

FINKELSTEIN: “Administrative data are an invaluable resource for rigorously evaluating health programs, but come with a unique set of challenges. J-PAL has developed a 45-page practical guide on how to obtain and use nonpublic administrative data. As a companion, a growing catalog of administrative data sets outlines specific requirements and processes for obtaining data from several key agencies. For example, we’ve compiled cost estimates—some data, such as Medicare claims data, can cost thousands of dollars to acquire—and an estimated timeline, since some can take several months to more than a year to obtain. Since their creation in January, our guide and catalog have each been downloaded about 1,000 times.”

PENTLAND: “Data access, data preservation, and data privacy are big issues. If researchers can’t get at the data, they can’t find the patterns. And we want to enable people to keep track of their health. When Latanya Sweeney SM ’97, PhD ’01 was chief scientist for the Federal Trade Commission, one of her main issues was patient privacy, patient ownership of data. Things are happening, but not necessarily fast enough. As a way to address this, I started a group in IDSS called Connection Science. We run “Living Labs” around the world using a platform, DataHub, that was developed at MIT. One lab in Italy focused on giving young families more information about their health and behavior and letting them maintain control of and share their data securely. We looked at their attitudes about privacy and how the data could help them raise their families better. Now we have a project called Open Algorithms to implement this sort of privacy protection more broadly within the EU and other places.”

Informing decisions

DOYLE: “There’s a lot of talk about care redesign, but to figure out if the redesign is effective I advocate for smart piloting. You could say a program was implemented and costs went down 20%, but you might have gotten that 20% regardless of what you did. Or they might have gone down 15% regardless, and you achieved a 5% relative improvement. That’s why we develop tools such as realtime, low-cost, randomized controlled trials to show a causal effect on outcomes. Providers need data to figure out how to streamline and improve health care at the same time.”

FINKELSTEIN: J-PAL’s US Health Care Delivery Initiative (HCDI) provides a coordinated platform to support randomized evaluations of strategies that aim to improve the quality and value of health care delivery. At the outset of the initiative, [research scientist] Sarah Taubman and I published a review paper in Science, documenting the scarcity of randomized evaluations of health care delivery strategies. We also outlined priority research questions in health care delivery, and provided specific guidance on potential research designs to answer those questions. The paper was covered by the New York Times and the Freakonomics podcast, and has guided the development and funding of specific studies and influenced policy conversations. HCDI has funded or otherwise supported the development of 15 randomized evaluations to date.”

GUTTAG: “A year ago I cofounded a company called HEALTH[at]SCALE Technologies. Hospital stays are getting shorter and shorter, and a lot of people now go from the hospital to skilled nursing facilities. How do you match the patient to the right facility? Right now, it’s an ad hoc procedure. With our sophisticated machine learning technology, we help providers be more efficient in matching patients with both therapies and institutions. They enter information about the patient and our tool tells them what facility we think will result in the lowest probability of readmission and shortest predicted length of stay. Our clients are the hospitals, but indirectly they’re also the patients. Increasingly these are decisions that are made together by the providers and the patients—and the payers as well—and it should be an informed decision.”

Extracting insights

BROWN: “Starting in 2012, I began recording electroencephalogram (EEG) readings, which track electrical activity in the brain, from all of my anesthesiology patients. From these data, we made the unexpected discovery that after the administration of nitrous oxide at anesthetic doses, the brain experiences large-amplitude slow-delta waves twice as large as similar ones experienced in sleep. We don’t know why these powerful waves continue only for around three minutes, but if they could be maintained at a steady state, nitrous oxide might be used as a potent anesthetic from which rapid recovery would be possible.”

GUTTAG: “My group is working with Massachusetts General Hospital (MGH) on infection control. There are certain kinds of infections that exist more strongly in clinical environ ments than in the population at large. One of the most common is an intestinal infection called Clostridium difficile. In the US, about 1% of inpatients contract this. It has serious complications, and it’s very hard to treat. The spores can live a long time outside the human body and are not killed by typical room cleaning procedures, so it’s very tricky. We’re working with MGH, using many years of medical records, to build models predicting who is most likely to get infected so that preemptive actions can be taken. Going forward, we’re also trying to understand the source of the infection. I think we’re going to be able to have a real impact on MGH, and also, I hope, show how you can do this more generally.”

DAHLEH: “The complexity of many current systems stems from the interaction between people, institutions, and engineered systems. Such interactions have become much more interesting as mobile devices have allowed people to assess the state of the system and react to it. Think of Google Maps and how it impacts the patterns of congestion; high velocity trading and how it impacts the fragility of the financial system; or the continuous monitoring of people’s vital signs from smart watches and devices—an unprecedented opportunity for both risk analysis and intervention. In order to correctly predict future behavior and mitigate risks, it is essential that models incorporate both the physical protocols and people’s patterns of behavior on the system.”

Educating for impact

PENTLAND: “A sea change would be to move from a disease system—when you get sick, you engage with the system—to a health system. Recognizing early signs, understanding how to take care of yourself. Doctors don’t know what you do 99% of the time. Now, using things like cell phones and credit cards, you could make a map of your life: when did you get up, how many people did you talk to, how far did you walk, what did you eat, how much sunlight did you get? Over time, these are excellent predictors of quality of life, which is perhaps the best measure we have of how healthy you are. These are not complicated things. But we never had the data before. This insight led to my cofounding both Ginger.io and CogitoHealth.com.”

DAHLEH: “Decision theory has a new challenge, namely co-designing the feedback control system for the physical system (e.g., management of health care protocols) and simultaneously designing incentive mechanisms, like subsidized fitness programs, for people to maximize social benefits. More so, higher-level decisions become extremely important in setting various policy questions, such as 30-day hospital readmission policies. This particular view is now reflected in our newly designed PhD program on Social and Engineering Systems. The objective is to educate the new generation of students to be deeply knowledgeable with engineering, social, and economic principles as they approach societal challenges.”

DOYLE: “At Sloan, we’re starting a master’s program in data analytics. There’s been an explosion of interest among students regarding how to evaluate credible arguments. Students in my course on econometrics for managers tell me they want to be able to speak to data scientists. It’s exciting that we have a lot more data-literate people—and that includes doctors, patients, and payers, who have to recognize value when they see it. This is the time to be a health economist, especially a data-driven one.”

GUTTAG: “If you look at resumes of thought leaders around the country, an awful lot of them have MIT pedigrees. First and foremost, MIT is an educational institution. Our research certainly has an impact, but our graduates have a bigger impact. I think computer science is going to revolutionize medicine—so there’s a group of us trying to train the next generation, not just of data scientists but health care data scientists. Health care data are not like the data that you see at Amazon, or Google, or Facebook. These data are very different, and learning how to derive actionable information from them is not easy. We’re working hard to figure out better ways to do it, and better ways to educate our students, because they’ll really be the ones who will transform medicine.”