By Melissa Pandika
As increases in computational power allow researchers to churn out massive amounts of data, scaling up analysis of it has grown ever-more challenging. At the Molecular Medicine Tri-Conference in San Francisco, a multidisciplinary cadre of thought leaders tackled this issue in the Machine Learning and Artificial Intelligence track. They shared their latest applications of these technologies to large-scale data to streamline drug discovery, clinical trials, and diagnostics.
The Machine Learning and AI track launched on Monday with a focus on translating data into therapies. Asim Siddiqui, chief technology officer at NuMedii, discussed how his company develops technologies that allow for the integration biological data and creation of models that can predict drug-disease pairings. By accurately predicting these pairings, NuMedii aims to improve the probability of successful clinical trials. Siddiqui pointed to the high failure rates of clinical trials and noted that even modest decreases in these failure rates provide value. He described NuMedii’s Artificial Intelligence for Drug Discovery (AIDD) technology, which integrates data from the literature, as well as -omics and other data from partner and public data streams, to the common platform, where his team can run analytical algorithms for discovery. AIDD spans hundreds of diseases, plus thousands of compounds and targets.
NuMedii’s technology has led to a number of promising predictions. For instance, imipramine, a tricyclic anti-depressant, was predicted to have anti-cancer activity associated with tumor cell apoptosis. Indeed, the compound has shown activity against multiple small cell lung cancer models in vitro and in vivo. Siddiqui concluded by noting that “you don’t need a huge team to accomplish a lot,” pointing to the wealth of data in the public domain that his team has mined with machine learning.
The next series of talks looked at efforts to develop AI automated diagnostics. Ryan Amelon from IDx Technologies shared findings from a clinical trial of IDx-DR, an AI system designed to detect diabetic retinopathy in adults, which led to FDA approval last April as the first-ever fully autonomous diagnostic for use without a specialist. Ophthalmologists “are not so great” at classifying diabetic retinopathy, a leading cause of blindness in the working age population, with sensitivity ranging from 33 to 73%, Amelon said. The clinical trial of IDx-DR, which included 900 patients across 10 sites, found it to be 87.2% sensitive and 90.7% specific, although Amelon said he and his team remain blinded to the data. , Highly experienced, certified retinal photographers obtained the Early Treatment Diabetic Retinopathy Study (ETDRS) reference standard—the “gold standard” for measuring diabetic retinopathy severity–which was then compared to IDx-DR’s output.
Amelon then outlined how IDx-DR meets the qualifications of a fully autonomous AI system. He highlighted its usability, noting that operators need only a high school diploma and must have no experience using a fundus camera. What’s more, the system determined that 96% of exams in the FDA study were of diagnostic quality. IDx-DR also directs operators to retake poor-quality images. Finally, the output is actionable, and the system rigorously validated. In fact, Amelon and his team have developed algorithms that separately detect various biomarkers of diabetic retinopathy, resulting in roughly 12 points of validation instead of just one. When it comes to developing a fully automated diagnostic to replace the physician, “there’s a lot more that goes into it besides just training the algorithm,” Amelon said.
Joel Saltz, chair and professor of biomedical informatics at Stony Brook University, discussed a multi-institutional collaboration to develop a deep learning-based computational stain for tumor-infiltrating lymphocytes, described in Cell Reports last April (DOI:10.1016/j.celrep.2018.03.086). Patterns of tumor-infiltrating lymphocytes were associated with cancer type, clinical outcome, and tumor and immune molecular features. For instance, those that border the tumor—which the tumor prevents from penetrating—were associated with poorer outcomes. “The goal here is not to replace pathologists,” but to find biomarkers to determine what kind of therapy would make sense, Saltz said.
Aalpen A. Patel from Geisinger then described the company’s development of a deep learning algorithm trained using a large, diverse set of medical imaging data to identify intracranial hemorrhage on head CT scans and to help physicians prioritize patients for diagnostic screening. Intracranial hemorrhage accounts for around two million strokes every year, and almost half of deaths occur in the first 24 hours of an intracranial hemorrhage. In the clinic, the algorithm lowered the time to diagnosis of new outpatient cases of intracranial hemorrhage by 96%.
Patel emphasized that using a large volume of heterogenous, clinical-grade data is essential to developing clinical tools. He also noted that, amid the emergence of AI-based clinical tools, what physicians do will change. AI has and will improve patient care, and he encouraged physicians to be more involved in its evolution. And while it’s important to ask how AI can help physicians care for patients, “the question we should really be asking is, ‘How can I help the patient?’” he said. “That means you can improve operations and post-visit care, as well.”
On Tuesday, Imran Haque, former chief scientific officer of Freenome, gave the first of a series of talks on machine learning for biomarker discovery, which delved into the two major approaches to biomarker discovery, as well as their limitations and how to address them. Recognizing the high cost of sample acquisition, the mechanistic approach uses the fewest samples possible at each step, identifying a biological mechanism and progressing to retrospective and prospective follow-ups—but with no guarantee that performance will generalize. The more machine learning-driven empirical approach, on the other hand, acknowledges that biology has too many “unknown and unknowns” to form useful hypotheses from the outset and instead lets the data speak for itself, relying on collecting lots of data and creating statistical models–but statistical methods require more data than they appear to initially. Again, the cost of attaining samples remains a core problem.
Haque cited examples of how both approaches have failed and suggested bridging the gap between them as a solution and address the issue of small data set sizes in biology. He noted that statistical methods will always take the easiest way out to get the answer you “want.” (For instance, the lack of a test set will lead to overfitting to the training set.) As a result, you’ll be right for the wrong reasons, which “still looks right to the algorithm,” he said. Instead of designing a discovery project to work, he suggested designing it to not fail—that is, to design around all the ways the algorithm could “cheat.” The biological mechanism could allow you to define positive and negative controls, providing invariants you could enforce on or teach to a model, potentially helping to solve the empirical sampling problem.
Later in the session, Marina Sirota, an assistant professor at the Bakar Computational Health Sciences Institute at UCSF, described her lab’s use of computational integrative methods to identify the determinants of preterm birth. She detailed how she and her team have integrated pollution exposure datasets with birth record datasets, both within California, enabling them to pinpoint two water contaminants associated with preterm birth. Meanwhile, their meta-analysis of three studies using maternal blood samples identified a maternal transcriptomic signature of preterm birth, which showed an upregulation of innate immunity and downregulation of adaptive immunity. They saw the reverse in the fetal transcriptomic signature of preterm birth and are further investigating this inverse relationship between maternal and fetal transcriptomic signatures.
On Wednesday, Kevin Hua of Bayer Digital Health looked at AI in drug discovery, beyond conventional data sciences. Hua outlined tasks that clustering, classification and other conventional machine learning technologies can accomplish—such as image classification and protocol design—followed by tasks that require advanced AI technology—such as biological modeling, and time and cost reduction. The problem is, many tasks in drug discovery can’t be accomplished by machine learning technologies and need advanced AI technologies, he said. He emphasized model-based reasoning, which can model how molecules work in biological systems and could be used for structured mapping of small molecules, for instance.
A session on developing tools in support of clinical data mining for diagnostic innovation closed out the Machine Learning and AI track. Renee Deehan Kenney of PatientsLikeMe described the development of the DigitalMe explorer and researcher platform that helps researchers glean meaningful health insights from complex biological datasets via machine learning and other modeling techniques. Guergana Savova, an associate professor at Harvard Medical School, then discussed DeepPhe, software that uses natural language processing to extract deep phenotype information from cancer patients’ electronic medical records, including temporality—meaning it places surgeries and other events on a timeline, crucial for measuring disease activity, for instance.
Harry Glorikian, a healthcare consultant and author of MoneyBall Medicine: Thriving in the New Data‐Driven Healthcare Market, moderated a panel discussion featuring Savova and Kenney that focused largely on the first place where they see their work making the biggest impact. Savova foresees DeepPhe having some kind of human-assisted application until the error rate improves, although she noted that what would qualify as an acceptable error rate in clinical care remains unclear. She likened DeepPhe’s first application to cruise control in cars “so that the patient can always intervene… before we go to complete autonomous cars,” she said.
Since PatientsLikeMe is a website where patients can connect with other patients and track information about their health, further down the road, Kenney envisions “being able to show every person a window into what’s going on in their body and allow them to track this over time. If you decide to go on the paleo diet, will that matter?” In other words, citizens could use the platform to essentially conduct longitudinal experiments on themselves. Eventually, machine learning and AI may emerge as research tools that empower not only scientists and clinicians, but patients, as well.