2020 cohort

Meet our 2020 CDT cohort.

Leonardo Castorina

Leonardo Castorina

Personal webpage

PhD project: Machine learning methods for de novo protein design

Proteins are the molecules that perform almost all of the biochemical work in all living systems. They are also used as catalysts in the industry to produce expensive specialty chemicals. Proteins are long polymers built from 20 fundamental building blocks called residues. In natural conditions, proteins fold up to a well-defined 3D structure, which gives proteins their function. Recently, significant progress has been made in predicting the 3D structure of proteins from their sequence of residues (the protein folding problem).

This project focuses on the inverse problem, meaning, how can we design sequences of residues that fold into specific shapes. We break down the problem into two parts: 1.computational design of the shape of the protein (empty backbone), and 2. Identifying the residues that will fold into the backbone (inverse protein folding problem).

We aim at designing novel backbones using generative models such as GANs (Generative Adversarial Networks). These will take known protein backbones and attempt to generate natural-looking backbones with shapes previously unseen in nature.

We then aim at improving our method for the inverse protein folding problem, TIMED (Three-dimensional Inference Method for Efficient Design), which currently provides state-of-the-art performance. At the moment we make use of Convolutional and Graph Neural Networks and we are hoping to explore new deep learning methods such as Recurrent Neural Networks and Encoders-Decoders.

Success in these two problems will lead us to have all the tools needed for the design of de novo proteins. We will then use our methods to design and re-design proteins with specific applications in mind, such as increased catalytic efficiency or better stability. Finally, we aim at making these tools.

 

Filippo Corponi

Filippo Corponi

PhD project: Machine Learning for Precision Medicine and Causality in Mental Health

Mental illness is a leading cause of disability worldwide. Genetics and neuroscience, while having undoubtedly pushed forward our understanding of mental health, have not yet delivered any major actionable breakthrough. Adolescence is notoriously a period of vulnerability to mental problems with nearly half patients suffering from mental illness having an onset by the age of 18. The favourable conjuncture of large, prospective, deeply-phenotyped cohorts becoming available and machine learning (ML) techniques being fast developed for extracting knowledge from data can unlock great potential for precision psychiatry. This project at the intersection between neuroscience, psychiatry, and ML aims to investigate the following questions using the Adolescent Brain and Cognitive Development (ABCD) study and propose ML solutions.

1) The current psychiatric nosography rests almost entirely on patient-reported and clinically observable symptoms and behaviors and it is not informed by a biological understanding of the pathophysiological mechanisms driving mental illness. Unsupervised learning approaches capitalizing on a wealth of biological and behavioral data can retrieve disease groups and dimensions which can then be interpreted and tested against their clinical usefulness. 2) While several studies have adopted supervised learning to make predictions about clinically meaningful outcomes, most are limited to one modality. This is in relation to the paucity of data sets collecting data on multiple modalities but also the lack of well-established frameworks for integrating different modalities. 3) Lastly, current ML approaches fall short of disentangling correlation from causation, resulting in suboptimal predictions and limited insights into causal factors.

Jan Dabrowski

Jan Dabrowski

PhD project: Biogerontology: understanding the human aging process

For the majority of their lives, human beings are enjoying a period of relative fitness and resistance to illness. However, in the process we call aging, people are increasingly suffering from diseases and their risk of death becomes greater. Scientists are considering many explanations for this progressive decline of the human body, a field of study named biogerontology. In this project, we will be considering the yet unexplored avenues in so-called hallmarks of aging. The first subject we will focus on is cellular senescence, a gradual accumulation of damaged and shutted down cells in the human body. Senescence in the context of human aging is a young research area. The underlying biological process is not clear. We will be improving this understanding by determining the content of senescent in different tissues of the human body at different points of time. We will then link that understanding with person's lifestyle factors, genetic variance, diseases, and such, to provide actionable measures to intervene in the process.

 

Ella Davyson

Ella Davyson

PhD proposal: A multi-omics investigation of Major Depressive Disorder and differential antidepressant response- for future stratification?

Major Depressive Disorder, otherwise referred to as depression, is common condition primarily characterised by a persistent low mood and loss of interest in one’s usual activities. Current antidepressants are limited as their therapeutic effects take around 4-6 weeks to emerge, and around 40% of those diagnosed with depression do not respond to anti-depressant therapy at all. Currently, there is no clear reason why antidepressants have such disparate effects on different people. However, an increased understanding of the biological pathways implicated in depression, antidepressant exposure and response may illuminate why there is inter-individual variability and how treatment for depression can be improved. Studies have found that approximately 40% of the risk of depression is caused by genetic factors. Analysis of the genome in those with MDD compared to controls found that there are many genetic factors associated with MDD, however each alone only contributes a small percentage of the genetic risk. Importantly, the genome can be modified by environmental factors which can affect the gene expression and the risk of depression outside of the base genetic code. Multi-omics refers to the analysis of various forms of biological data, such as DNA methylation, RNA and protein expression, to find novel associations between biological processes and diseases (such as depression) that lie between sequence variation and the disease phenotype. This project aims to analyse GWAS and protein expression data in MDD cases and controls and in individuals who do and do not respond to antidepressants, to increase our mechanistic understanding of the condition and antidepressant response.

Justin Engelmann

Justin Engelmann

Personal webpage

PhD project: Machine Learning for Retinal Image Analysis

The retina is a thin layer of tissue at the back of the eye which allows us to sense light and thus see. Retinal diseases damage this tissue and often lead to degraded vision or even blindness. This reduces the well-being of affected patients and also has major economic impact through healthcare costs and loss of productivity. If such disease is detected early, sight loss can be stopped or at least slowed down. However, retinal diseases are often not noticed by patients before their retina is already severely damaged. Thus, we need to screen for such diseases by taking pictures of the retina. Technology to take such retinal images is advancing and becoming increasingly widespread. However, currently, each image needs to be examined by a human expert, which is slow and expensive. Machine learning models, also commonly referred to as artificial intelligence, learn from existing data and can be used to screen retinal images for disease. Existing work shows that such models can accurately detect retinal disease, but there are some challenges that need to be overcome before such models can be used in clinical practice. For example, previous models were trained on images that were easier to diagnose than those a model would see in practice, because researchers excluded noisy or otherwise difficult images. The model also needs to output something that can be understood and acted upon by clinicians. Finally, we need to understand how the model works to ensure that it is reliable. We aim to address these challenges to produce a highly accurate, reliable, explainable, and robust machine learning model for retinal disease diagnosis that supports doctors to make them more efficient and accurate. In the process of addressing these challenges, we might devise new methods which could then be applied to other areas of medical imaging where machine learning can be used.

 

Salvatore Esposito

Salvatore Esposito

Personal webpage

PhD project: Dexterous control of prosthetic/robotic limbs using temporal convolutional neural networks

Machine learning algorithms have been implemented in the field of neuroprosthetics to detect and classify patterns in sEMG signals. To the best of our knowledge, several ML algorithms have been used for hand movements of prosthetics, however, only limited research has been attempted in classification of finger postures and grasping force predictions. Gathering sEMG signals is a challenging task because it requires recording sEMG signals from amputees while they are performing tasks with a computer interface or robotic hand glove. Neural networks performance is highly correlated to the amount of data which the model has been trained upon. Thus, the first objective of the project is to build an algorithm using Generative adversarial networks (GANs) to generate new EMG signals for training ML models[4]. Investigating the use of temporal convolutional neural networks, which combine the benefits of convolution and time-series analysis from recurrent neural networks (RNNs) to predict finger movements of robotic arms it’s going to be the main objective of the research which is being proposed. The main outcome of the project will be implementing a novel sequential prediction model which predicts and classifies finger movements in real time experiments.

 

Rohan Gorantla

Rohan Gorantla

Personal webpage

PhD project: Machine Learning for Drug Design

Industry collaboration with Exscientia

Modern drug design is a time-consuming and costly process that often takes up to 12 years on average. It spans from identifying a drug target, to finding a drug candidate, and getting this candidate approved for clinical use. One way of speeding up this process is in the early stages of drug discovery, where we can make use of computational methods to come up with new drug candidate molecules and computationally predict how well they may bind and inhibit the function of the target protein. Only the most promising molecules are then synthesized to reduce time and cost spent on synthesis. As a result, it is crucial to have reliable and accurate methods that can predict how well a drug-like molecule will bind to a target protein at a large scale (more than 1 million molecules) so as to speed up the overall drug discovery process. This project aims to provide a fast, robust and accurate estimate of binding affinity for a given protein and drug candidate.

Marcin Kedziera

Marcin Kedziera

PhD project: Direct and Rapid Antimicrobial Susceptibility Testing of Whole Blood Using Deep Learning

The proposed project will seek to develop improved rapid antimicrobial susceptibility testing methods directly from unprocessed blood samples. It will combine microfluidics, digital microscopy and deep learning to create a low-cost test able to assess whether an antibiotic is likely to be effective in fighting an infection. This test will be compared to current best practices to determine its utility in diagnosing clinical sepsis patients and recommending the best treatment option.

Olivier Labayle Pabet

Olivier Labayle Pabet

PhD project: Inference of causal epistatic interactions in complex trait or disease using Targeted Learning

Since their introduction in 2005, Genome-wide association studies (GWAS) have become widespread in population genetics. While their contribution to past discoveries is undeniable, they are poorly suited to understand complex relationships between causal genetic variants and diseases. Indeed, they often rely on too simplistic assumptions.

Because those initial hypotheses are invalid they lead to incorrect conclusions and false discoveries. Moreover, when the dataset’s size grows, our confidence in those false discoveries are also strengthened. In this project we aim to leverage the recent framework of Targeted Minimum Loss-Based Estimation (TMLE), for estimating the effects of interacting variants on diseases using the UK-Biobank. TMLE is backed-up by mathematical justifications and makes only weak assumptions that are guaranteed to hold. It also uses the power of state of the art machine learning algorithms. It has notably been successfully used in general settings with a single treatment variable. Here we intend to use it in the context of the interaction effects of genetic variants. In the forthcoming work, we will first use the biological context of the Vitamin D receptor which is associated with various disorders. We will thus apply the framework to discover causal variants of diseases present in the UK-Biobank mediated by the vitamin D receptor. Then we will generalize the approach, to support for example, nth-order interactions. Finally, we will be in a position to release a general software for causal interaction effects discovery. This will be a first step, that will contribute to the future of personalized medicine.

Bryan Li

Bryan Li

Personal webpage

PhD project: Neuronal learning analysis using deep unsupervised methods

One of the central goals in computational neuroscience is to understand how cortical responses reshape in the course of learning. With the advent of modern neural imaging technologies, experimentalists are able to monitor hundreds or even thousands of neurons in behaving animals for multiple days, thus allowing practitioners to analyze these high-dimensional population responses. Nevertheless, existing methods in studying neuronal adaptation and learning often impose strong assumptions on the data or model, resulting in biased descriptions that do not generalize. Thanks to their ability to self-identify features in complex data, deep learning models have proved successful in a wide range of biomedical tasks, including modelling cellular population dynamics. Moreover, deep unsupervised methods have shown promising results in self-identifying relationships in unpaired data distributions, such as image-to-image or language-to-language translations. As a result, deep unsupervised learning poses a promising candidate in learning the unknown mappings from pre-training to post-training neuronal activities. At the intersection of computational neuroscience and deep learning, this proposed project investigates the following questions. 1) To minimize biases and assumptions imposed on the behaviour data, we explore data-driven and unsupervised approaches to model the dynamics in neuronal activity over the course of learning. 2) Artificial neural networks (ANNs) are often viewed as black-boxes that are difficult, if not impossible, to comprehend their decision-making processes. To alleviate the said issue, we explore model and data visualization techniques in deep learning, which have seen rapid advancements in recent years. Furthermore, the explained features can provide us meaningful insights into the neuronal learning process. 3) Finally, existing work in this area often relies on different datasets and evaluation procedures, thus making direct comparisons between different methods difficult. We, therefore, intend to establish an open and standardized framework to train, validate and interpret machine learning models in neuronal learning analysis. With the aim to create a toolbox that allows experimentalists to apply different methods to their dataset with ease while enabling computational neuroscientists to experiment with new approaches and evaluate their results against other approaches in a fair comparison.

Craig Nicolson

Craig Nicolson

PhD project: Optimising Organ Donation through the use of Machine Learning to Predict Time to Asystole in Intensive Care

Background: Organ Donation from patients who are not able to survive their illness is possible after the life support is turned off, with the consent of their relatives. However, if the process of dying takes too long the organs are damaged and so they cannot be donated. There is no good way of predicting how long a person will take to die after life support is turned off. Aim: We think that the information we gather whilst a person is in hospital, such as heart rate and blood pressure, might hold the answer to this question. We hope to use computer systems to predict how long a person will take to die after life support is turned off, and this could improve the process of organ donation.

 

Matthew Whelan

Matthew Whelan

PhD project: Exploration of Bayesian Modelling Approaches in Schizophrenia and Bipolar Disorder

Schizophrenia and bipolar disorder are debilitating psychotic illnesses that affect a significant proportion of the population. However, the current practice of psychiatry, which relies on symptoms and subjective evaluations from patients, is severely limited.

There is little known regarding the causes of many illnesses, including schizophrenia and bipolar disorder, and consequently treatments rely on inefficient trial-and-error approaches. Computational psychiatry is a new field of research that aims to use recent developments in the fields of artificial intelligence, theoretical neuroscience, statistics, and electrical engineering, and applies them to psychiatry in order to better understand psychiatric disorders. This research project aims to continue in this vain for the purposes of understanding schizophrenia and bipolar disorder. In particular, we hypothesise that decision making in patients with these illnesses is disrupted due to alterations in how they perform inference. Inference is the mechanism through which prior, or previously learned, information is integrated with newly incoming information for the purposes of belief updating. We will develop models of inference, based mainly on a theory known as Bayesian inference, which will be tested in decision making tasks with patients. We expect that this will lead to novel insights into how perceptions, beliefs and therefore decisions differ in patients with schizophrenia and bipolar disorder, both from each other and from the healthy population.