A new, adaptive statistical model developed by a research team involving Cornell will make clinical trials safer and more effective, and – unlike most models – is precise enough to identify when a subset of a trial population is harmed by the treatment.

Developed by researchers from MIT, Microsoft, and Cornell, the model – Causal Latent Analysis for Stopping Heterogeneously (CLASH) – leverages causal machine learning, which uses artificial intelligence to statistically determine the true cause and effect among variables. It continually crunches incoming participant data and alerts trial practitioners if the treatment is causing harm to only a segment of trial participants.

By comparison, most statistical models used to determine when to stop trials early are broadly applied across trial participants and don’t account for heterogeneous populations, researchers said. This can result in harms going undetected in a clinical trial: If an experimental drug is causing serious side effects in elderly patients who make up 10 percent of the trial population, it’s unlikely a statistical model would detect such harm, and the trial would likely continue, exposing those patients to even more harm, researchers said.

“We can’t just be looking at the averages,” said Allison Koenecke, assistant professor of information science in the Cornell Ann S. Bowers College of Computing and Information Science. She is the senior author of “Should I Stop or Should I Go: Early Stopping with Heterogeneous Populations,” which was presented at the 37th Conference on Neural Information Processing Systems (NeurIPS) last December in New Orleans. “You have to look at these different subgroups of people. Our method quantifies and identifies harms to minority populations and brings them to light to practitioners who can then make decisions on whether to stop trials early.”

CLASH is designed to work with a variety of statistical stopping tests, which are used by researchers in randomized experiments as guideposts for whether to continue or end clinical trials early. Researchers said CLASH can also be used in A/B testing, a user-experience test comparing variations of a particular feature to find out which is best.

“We wanted to design a method that would be easy for practitioners to use and incorporate into their existing pipelines,” said Hammaad Adam, a doctoral student who studies machine learning and healthcare equity at MIT and the paper’s lead author. “You could implement some version of CLASH with 10 or 20 lines of code.”

“We need to make sure that harms in minority populations are not glossed over by statistical methods that simply assume all people in an experiment are the same,” Koenecke said. “Our work gives practitioners the tools they need to appropriately consider heterogeneous populations and ensure that minority groups are not being disproportionately harmed.”

Along with Adam and Koenecke, paper authors are: Fan Yin and Huibin (Mary) Hu of Microsoft Corporation, and Neil Tenenholtz, Lorin Crawford, and Lester Mackey of Microsoft Research.

This research was supported through the Cornell Bowers CIS Strategic Partnership Program with LinkedIn.

By Louis DiPietro, a writer for the Cornell Ann S. Bowers College of Computing and Information Science.