2020 cohort
Meet our 2020 CDT cohort.
Leonardo Castorina
PhD project: Machine learning methods for de novo protein design
Supervisors: Kartic Subr and Chris Wood
Proteins are the molecules that perform almost all of the biochemical work in all living systems. They are also used as catalysts in the industry to produce expensive specialty chemicals. Proteins are long polymers built from 20 fundamental building blocks called residues. In natural conditions, proteins fold up to a well-defined 3D structure, which gives proteins their function. Recently, significant progress has been made in predicting the 3D structure of proteins from their sequence of residues (the protein folding problem).
This project focuses on the inverse problem, meaning, how can we design sequences of residues that fold into specific shapes. We break down the problem into two parts: 1.computational design of the shape of the protein (empty backbone), and 2. Identifying the residues that will fold into the backbone (inverse protein folding problem).
We aim at designing novel backbones using generative models such as GANs (Generative Adversarial Networks). These will take known protein backbones and attempt to generate natural-looking backbones with shapes previously unseen in nature.
We then aim at improving our method for the inverse protein folding problem, TIMED (Three-dimensional Inference Method for Efficient Design), which currently provides state-of-the-art performance. At the moment we make use of Convolutional and Graph Neural Networks and we are hoping to explore new deep learning methods such as Recurrent Neural Networks and Encoders-Decoders.
Success in these two problems will lead us to have all the tools needed for the design of de novo proteins. We will then use our methods to design and re-design proteins with specific applications in mind, such as increased catalytic efficiency or better stability. Finally, we aim at making these tools.
Filippo Corponi
PhD project: Machine Learning for Precision Medicine
Supervisors: Stephen Lawrie, Heather Whalley and Antonio Vergari
Mood disorders are increasingly recognized among the leading causes of disease burden worldwide. Depressive and manic episodes in mood disorders commonly involve altered mood, sleep, and motor activity. These translate to changes in sensory data that wearable devices can continuously and affordably monitor, thereby positioning themselves as promising candidate to model mood disorders. The project studies to which degree mood disorders symptoms can be inferred from sensory data collected from wearable devices and whether clinically meaningful representations can be learnt from such data in an unsupervised manner. Another promising approach to mood disorder modelling is in terms of an accelerated brain aging as showed in MRI scans. The project investigates whether the brain suffering from a mood disorder merely follows an accelerated physiological aging pattern or presents disorder specific characteristics.
Jan Dabrowski
PhD project: Biogerontology: understanding the human aging process
Supervisors: Tamir Chandra and Catalina Vallejos
For the majority of their lives, human beings are enjoying a period of relative fitness and resistance to illness. However, in the process we call aging, people are increasingly suffering from diseases and their risk of death becomes greater. Scientists are considering many explanations for this progressive decline of the human body, a field of study named biogerontology. In this project, we will be considering the yet unexplored avenues in so-called hallmarks of aging. The first subject we will focus on is cellular senescence, a gradual accumulation of damaged and shutted down cells in the human body. Senescence in the context of human aging is a young research area. The underlying biological process is not clear. We will be improving this understanding by determining the content of senescent in different tissues of the human body at different points of time. We will then link that understanding with person's lifestyle factors, genetic variance, diseases, and such, to provide actionable measures to intervene in the process.
Ella Davyson
PhD proposal: A multi-omics investigation of Major Depressive Disorder and differential antidepressant response- for future stratification?
Supervisors: Andrew McIntosh, Riccardo Marioni and Mark Adams
Major Depressive Disorder, otherwise referred to as depression, is common condition primarily characterised by a persistent low mood and loss of interest in one’s usual activities. Current antidepressants are limited as their therapeutic effects take around 4-6 weeks to emerge, and around 40% of those diagnosed with depression do not respond to anti-depressant therapy at all. Currently, there is no clear reason why antidepressants have such disparate effects on different people. However, an increased understanding of the biological pathways implicated in depression, antidepressant exposure and response may illuminate why there is inter-individual variability and how treatment for depression can be improved. Studies have found that approximately 40% of the risk of depression is caused by genetic factors. Analysis of the genome in those with MDD compared to controls found that there are many genetic factors associated with MDD, however each alone only contributes a small percentage of the genetic risk. Importantly, the genome can be modified by environmental factors which can affect the gene expression and the risk of depression outside of the base genetic code. Multi-omics refers to the analysis of various forms of biological data, such as DNA methylation, RNA and protein expression, to find novel associations between biological processes and diseases (such as depression) that lie between sequence variation and the disease phenotype. This project aims to analyse GWAS and protein expression data in MDD cases and controls and in individuals who do and do not respond to antidepressants, to increase our mechanistic understanding of the condition and antidepressant response.
Justin Engelmann
PhD project: Machine Learning for Retinal Image Analysis
Supervisors: Miguel O. Bernabeu and Amos Storkey
The retina is a thin layer of tissue at the back of the eye which allows us to sense light and thus see. Retinal diseases damage this tissue and often lead to degraded vision or even blindness. This reduces the well-being of affected patients and also has major economic impact through healthcare costs and loss of productivity. If such disease is detected early, sight loss can be stopped or at least slowed down. However, retinal diseases are often not noticed by patients before their retina is already severely damaged. Thus, we need to screen for such diseases by taking pictures of the retina. Technology to take such retinal images is advancing and becoming increasingly widespread. However, currently, each image needs to be examined by a human expert, which is slow and expensive. Machine learning models, also commonly referred to as artificial intelligence, learn from existing data and can be used to screen retinal images for disease. Existing work shows that such models can accurately detect retinal disease, but there are some challenges that need to be overcome before such models can be used in clinical practice. For example, previous models were trained on images that were easier to diagnose than those a model would see in practice, because researchers excluded noisy or otherwise difficult images. The model also needs to output something that can be understood and acted upon by clinicians. Finally, we need to understand how the model works to ensure that it is reliable. We aim to address these challenges to produce a highly accurate, reliable, explainable, and robust machine learning model for retinal disease diagnosis that supports doctors to make them more efficient and accurate. In the process of addressing these challenges, we might devise new methods which could then be applied to other areas of medical imaging where machine learning can be used.
Salvatore Esposito
PhD project: Dexterous control of prosthetic/robotic limbs using temporal convolutional neural networks
Supervisors: Arno Onken and Matthias Hennig
Machine learning algorithms have been implemented in the field of neuroprosthetics to detect and classify patterns in sEMG signals. To the best of our knowledge, several ML algorithms have been used for hand movements of prosthetics, however, only limited research has been attempted in classification of finger postures and grasping force predictions. Gathering sEMG signals is a challenging task because it requires recording sEMG signals from amputees while they are performing tasks with a computer interface or robotic hand glove. Neural networks performance is highly correlated to the amount of data which the model has been trained upon. Thus, the first objective of the project is to build an algorithm using Generative adversarial networks (GANs) to generate new EMG signals for training ML models[4]. Investigating the use of temporal convolutional neural networks, which combine the benefits of convolution and time-series analysis from recurrent neural networks (RNNs) to predict finger movements of robotic arms it’s going to be the main objective of the research which is being proposed. The main outcome of the project will be implementing a novel sequential prediction model which predicts and classifies finger movements in real time experiments.
Rohan Gorantla
PhD project: Machine Learning for Drug Design
Industry collaboration with Exscientia
Supervisors: Antonia Mey, Andrea Weisse and Anthony Bradley
Modern drug design is a time-consuming and costly process that often takes up to 12 years on average. It spans from identifying a drug target, to finding a drug candidate, and getting this candidate approved for clinical use. One way of speeding up this process is in the early stages of drug discovery, where we can make use of computational methods to come up with new drug candidate molecules and computationally predict how well they may bind and inhibit the function of the target protein. Only the most promising molecules are then synthesized to reduce time and cost spent on synthesis. As a result, it is crucial to have reliable and accurate methods that can predict how well a drug-like molecule will bind to a target protein at a large scale (more than 1 million molecules) so as to speed up the overall drug discovery process. This project aims to provide a fast, robust and accurate estimate of binding affinity for a given protein and drug candidate.
Marcin Kedziera
PhD project: Direct and Rapid Antimicrobial Susceptibility Testing of Whole Blood Using Deep Learning
Supervisors: Till Bachmann and Kartic Subr
The proposed project will seek to develop improved rapid antimicrobial susceptibility testing methods directly from unprocessed blood samples. It will combine microfluidics, digital microscopy and deep learning to create a low-cost test able to assess whether an antibiotic is likely to be effective in fighting an infection. This test will be compared to current best practices to determine its utility in diagnosing clinical sepsis patients and recommending the best treatment option.
Olivier Labayle Pabet
PhD project: Inference of causal epistatic interactions in complex trait or disease using Targeted Learning
Supervisors: Ava Khamseh, Chris Ponting and Sjoerd Beentjes
Since their introduction in 2005, Genome-wide association studies (GWAS) have become widespread in population genetics. While their contribution to past discoveries is undeniable, they are poorly suited to understand complex relationships between causal genetic variants and diseases. Indeed, they often rely on too simplistic assumptions.
Because those initial hypotheses are invalid they lead to incorrect conclusions and false discoveries. Moreover, when the dataset’s size grows, our confidence in those false discoveries are also strengthened. In this project we aim to leverage the recent framework of Targeted Minimum Loss-Based Estimation (TMLE), for estimating the effects of interacting variants on diseases using the UK-Biobank. TMLE is backed-up by mathematical justifications and makes only weak assumptions that are guaranteed to hold. It also uses the power of state of the art machine learning algorithms. It has notably been successfully used in general settings with a single treatment variable. Here we intend to use it in the context of the interaction effects of genetic variants. In the forthcoming work, we will first use the biological context of the Vitamin D receptor which is associated with various disorders. We will thus apply the framework to discover causal variants of diseases present in the UK-Biobank mediated by the vitamin D receptor. Then we will generalize the approach, to support for example, nth-order interactions. Finally, we will be in a position to release a general software for causal interaction effects discovery. This will be a first step, that will contribute to the future of personalized medicine.
Bryan Li
PhD project: Foundation model as a digital twin of the mouse visual cortex
Supervisors: Arno Onken and Nathalie Rochefort
Understanding how the visual system processes information is a fundamental challenge in neuroscience. Recently, predictive models of neural responses to naturally occurring stimuli have shown to be a successful approach toward this goal, serving the dual purpose of generating new hypotheses about biological vision and bridging the gap between biological and computer vision. With the advent of large-scale neural recordings and the emergence of visual foundation models, a foundation model of the mouse visual cortex holds tremendous potential. This project aims to design large-scale multi-modal methods that can accurately predict visual responses to natural stimuli across animals. This approach relies on the idea that high-performing predictive models can account for the nonlinear response properties of neural activities thus explaining a large part of the stimulus-driven variability. Moreover, we are interested in interpretable approaches that can illuminate the modulation of neural responses by visual input and behaviour variables, thus providing a platform to investigate the computation in the visual system in silico.
Craig Nicolson
PhD project: Optimising Organ Donation through the use of Machine Learning to Predict Time to Asystole in Intensive Care
Supervisors: Thanasis Tsanas, Nazir Lone, Kathryn Puxty and Martin Shaw
Background: Organ Donation from patients who are not able to survive their illness is possible after the life support is turned off, with the consent of their relatives. However, if the process of dying takes too long the organs are damaged and so they cannot be donated. There is no good way of predicting how long a person will take to die after life support is turned off. Aim: We think that the information we gather whilst a person is in hospital, such as heart rate and blood pressure, might hold the answer to this question. We hope to use computer systems to predict how long a person will take to die after life support is turned off, and this could improve the process of organ donation.
Matthew Whelan
PhD project: Interpretable AI modelling to predict cognitive/mental health outcomes from rest-activity patterns in UK Biobank
Supervisors: Daniel Smith, Jacques Fleuriot, Stephen Lawrie and Amy Ferguson
Rest-activity patterns, which measure patterns related to sleep and activity levels throughout the day, have shown to associate with a wide range of health outcomes, including overall mortality risk. However, the association between rest-activity patterns and certain mental and cognitive health outcomes, such as brain volume and dementia risk, is unclear. This project applies explainable AI modelling methods that aim to predict mental/cognitive health risks from rest-activity patterns using the UK Biobank dataset. Interpretable AI methods provide clearer understanding on the features within a dataset most important for making the predictions, in contrast to many of the black-box AI methods currently in popular use, even if their predictive power is often inferior. Whilst balancing predictive power with interpretability is challenging, interpretability is a critical component if AI modelling approaches are to be adopted within clinical practice.