Past MSc projects
Abstracts of MSc projects completed by CDT students in previous cohorts.
Early screening and detection of pathologies are important to improve health outcomes of the general population. For automatic classification of eye diseases such as age-related macular degeneration (AMD) and diabetic retinopathy (DR), convolutional neural networks are often employed. Since retinal datasets are usually small in size, the standard approach is to perform transfer learning, i.e. use the weigths of a model trained on Image-Net as initialisation before fine-tuning the model on the retinal dataset available. Recent papers have shown the limitations of transfer learning from ImageNet. In particular, it has been shown that in medical imaging, performing transfer learning may not always improve the performance of a model trained from a random initialisation. In this work, we would like to propose SelfAdapt, a method that is able to exploit unlabelled data from different datasets to learn features in a self-supervised way while projecting all the data onto the same space to achieve better transfer. Through a series of experiments on AMD and DR grading, we show how our method improves the common approach of transfer learning from ImageNet followed by fine-tuning in terms of classification accuracy, AUC and clinical interpretability. Finally, we also test our approach on natural images and verify its effectiveness on Office-31 data.
Antibiotic resistant bacteria are an outstanding biomedical problem, associated with a significant financial burden and a large number of deaths. Nevertheless, antibiotic resistance is not fully understood. A widely used class of antibiotics – the quinolones – induce bacterial cell death by causing DNA damage. However, bacteria are equipped with a biological machinery – particularly the RecBDC protein complex – that repairs the bacterial DNA and permits the evolution of superbugs. Our main aim in this thesis is to find optimal experimental designs for parameter inference in a recBCD gene expression model.
The field of Bayesian experimental design (BED) is concerned with a principled way of finding optimal experimental designs. In particular, we use MINEBED – a novel, likelihood free Bayesian experimental design methodology – to find optimal experimental designs. It offers a significant improvement over a number of other existing BED methodologies, due to allowing the usage of mutual information when searching for optimal designs.
In this thesis we validate MINEBED on a gene expression model and investigate multiple design optimization scenarios of practical relevance. Particularly, we show that MINEBED provides experimental designs that are reasonably robust to unknown sources of variability. Moreover, we show that at least two measurements are necessary to get a good parameter estimate in the recBCD gene expression model. Based on our results, we discuss how BED in general and MINEBED in particular can provide invaluable help in designing more efficient laboratory experiments in terms of cost, labour and quality of measurement.
Single-cell RNA sequencing allows to identify and study cell-to-cell heterogeneity which is not possible using more conventional bulk sequencing methods. During sequencing cells are destroyed, making it impossible to obtain gene expression time-series trajectories for any given cell. Instead, only a snapshot for the expression profile associated to each cell can be obtained. Pseudotime inference methods allows to align these snapshots into a longitudinal trajectory. This alignment can capture dynamic cellular processes, such as cell cycle or differentiation and acts as a proxy for time-series.
In this work, we model gene expression trajectories along pseudotime, both in terms of mean expression and cell-to-cell variability patterns. This extends the work that we previously carried out in relation to this topic. We will conduct thorough testing of our existing model and its Bayesian extensions, identifying both advantages and disadvantages of each model and comparing their performance to the performance of other, publicly available tools.
Moreover, we will address the problem of mean-variability confounding that is often observed in single-cell RNA sequencing data. Finally, we will try to perform inference on the gene expression trajectories, while doing clustering. For this task we apply Dirichlet processes mixture models using various probabilistic programming languages.
In vivo deep-brain calcium imaging is a powerful technique used to monitor the activity of populations of neurons in the brains of freely moving animals. However, to perform data analysis and extract individual neuronal signals from the data, multi-stage pre-processing needs to be performed. In collaboration with the Centre for Discovery Brain Sciences, experiment on freely-behaving mice engaged in exploratory and navigational task was conducted. Using 1 photon-based calcium imaging, neural activity of an animal was recorded. The goal of the experiments is to provide localisation prediction based on images acquired from mice hippocampus. Exploration of interpretable deep learning based image processing algorithms for minimising pre-processing steps solution is applied.
Applied ResNet10 with background correction as behaviour predictor was computed. Neuron activation mapping inspired by class activation mapping for the object localisation was computed. Data impact was analysed using two different approaches: statistical frame merge and background correction. In addition, temporal dynamics of neural activity was explored. Generalisation ability of the model was also evaluated.
This study aims to recreate and improve upon the state-of-the-art method for the identification of relations between drugs and drug-related information in the context of adverse drug events. We analyse and pre-process the data for the 2018 n2c2 Track 2 shared task, consisting of 505 discharge summaries of patients with adverse drug events from the intensive care units of the Beth Israel Deaconess Medical Centre in Boston. Using the gold standard of drug-related entities and relations provided for the challenge, and the current state-of-the-art method for this task we fine-tune a BERT classifier as our baseline model. We then proceed to investigate improvements to the baseline model through fine-tuning a domain-specific BERT classifier Clinical BERT, and enhancing the input text by including additional left and right context, and drug name normalisation. We find that a version of BERT pre-trained on text from the biomedical domain outperforms the more generic base BERT; that additional input context more uniquely defines the input text resulting in improved performance; that a domain-specific version of BERT can be combined with additional left and right context for further improvement; and that providing medical background knowledge via drug-name normalisation, while leading to lower performance when used in isolation, does not hinder the performance of models when combined with either domain-specific BERT or additional context. Our best model achieves an overall micro F1 score of 0.964 surpassing the state of the art.
Immunopeptides are peptides that are presented to the immune system of an organism and can be used in treatments for cancer and viral infections. Motif discovery techniques can be applied to immunopeptides to discover conserved regions in the peptide sequences, known as motifs. These motifs can then be studied for various properties and their suitability for vaccine design. This research project builds on previous work by the Alfaro lab. The main outputs of this project is a new motif discovery pipeline that can be applied to samples of immunopeptide sequences, and a strategy to compare the performance of different techniques. Firstly, this pipeline was applied to a sample of 7,760 immunopeptide sequences to discover motifs. These motifs were then compared to those found by an in-house pipeline developed by the Alfaro lab, and to those found by an existing motif discovery technique called GibbsCluster. The quality of these motifs was assessed by a biologically motivated quality score defined in this project, and by the proportion of peptides that they covered. The results show that the motifs found in this project covered more peptides in the sample than was achieved in previous work however, these motifs did not achieve higher quality scores. Also, the pipeline developed in this project found motifs with better quality scores and covered more peptides, than those found by the existing motif discovery technique GibbsCluster, on the same sample of immunopeptides. Further research still needs to be conducted to apply these methods across more samples of immunopeptides, and to determine which technique to use in future work.
Con-ikot-ikot (CII), a small peptide toxin, is a promising candidate for developing a fluorescent probe for single-particle tracking of the dynamics of AMPA receptors (AMPARs) - a subtype of ionotropic glutamate receptors involved in synaptic plasticity, which is the underlying mechanism of the processes of memory and learning. Its structure could be used as a skeleton for genetic modification to alter its function, as well as computational protein design to develop smaller con-ikot-ikot-inspired probes; however, due to the laboratory production of CII being time-consuming and financially expensive, and because the yield of the protein is very low, it would not be possible to test all proposed mutants and designs in vitro. Molecular dynamics (MD) simulations have been carried out in this project to create the basis for a data set for use in the further stages of computational protein design and engineering. The simulations in this project were used to investigate con-ikot-ikot monomer and dimer in the context of their interactions with isolated AMPAR ligand-binding domains. Gaining an understanding of the interactions between CII and AMPARs will guide the design of mutants with altered binding sites or function, as well as small peptides with shapes mimicking the shape of the AMPAR-interacting interface of CII. Proposed structures would first be tested in silico and tweaked to optimise their characteristics, and a selected subset of designs that performed best in simulations could be produced for in vitro experiments.
Recent cognitive science theories view the human brain as constantly making probabilistic calculations akin to Bayesian inference, with priors corresponding to top-down influences and likelihoods to bottom-up ones. Within this framework, mental disorders are usually understood as impairments that overweight one stream of information relative to the other. Circular inference is a novel extension of this approach, which mirrors the excitatory to inhibitory imbalances often found in mental illnesses. It predicts that, in individuals with such imbalances, priors or likelihoods would get reverberated in the brain’s hierarchical models of the environment, overwhelming the inferential process. This framework has been used to explain the mechanisms behind the symptoms of schizophrenia, and has been subsequently verified in such patients.
Autism has a complex relationship with schizophrenia, which is based both on their similarities and their differences. Therefore, in this project, we explore the use of circular inference in modelling the behaviour of individuals with autistic traits. In order to do that, we implement a decision-making task and collect data from participants online. Surprisingly, we find that circular inference better models the behaviour of all the participants. However, despite the known inhibitory impairments in autism spectrum disorders, we find no evidence of any relationship between autistic traits and information reverberation or any other impairment. We proceed to analyse the project’s limitations, and propose directions for future research, to further investigate and verify the circular inference framework and its explanatory power in mental illness and human behaviour in general.
Optical coherence tomography angiography (OCTA) imaging is a relatively new modality for the discovery of retinal biomarkers linked to chronic diseases. Several studies have already shown the potential of using hand-crafted features on OCTA images for identifying patient status. However, they depend on the manual segmentation and pre-processing steps, which are prone to errors. In contrast to previous work, we suggest the usage of topological information which does not depend on geometric calculations and is more robust to noise.
We aim to investigate whether features based on topological invariants can help in identifying patient status from a relatively small dataset. We go further to test the robustness of the model on a dataset obtained from a different device. Additionally, we explore if a construction based on multiple topological invariants, which captures more in depth topological information by bi-filtering the images and a function on the images will help us detect particular regions where the difference between the diseased and healthy images are profound. Thirdly, we aim to link topological features with established biomarkers.
Our contributions are the innovative analysis of the topology of the OCTA images and the development of a pipeline for the discovery of topological biomarkers. Our results demonstrate how constructing and using topological invariants as features enables fairly accurate OCTA scan classification. As an additional benefit, our approach does not require pre-processing steps and enables better interpretability. We hope this will help doctors in their diagnosis through better human understanding of the detection. Finally, we show that our model maintains consistent performance across OCTA imaging devices, without any re-training which allows our method to be used as an automatic screening tool.