Industrial Track CDT studentship with IBM Research
We are recruiting to a fully funded 4-year studentship in collaboration with IBM Research to start in September 2021 as part of our CDT industrial track.
Getting in Shape: AI driven sampling of the shapes of biomolecules
- Prof Ben Leimkuhler (School of Mathematics)
- Dr Antonia Mey (School of Chemistry)
- Dr Flaviu Cipcigan (IBM Research)
Machine learning is set to revolutionize many aspects of science, including the way we use computational modelling to understand biomolecular processes in biomedical and pharmaceutical science. A perfect recent example alphaFold2, a machine-learning model used for protein structure prediction that managed to reach experimental precision . These innovations will in turn open the door to new discoveries regarding the properties of biochemical systems and, notably, fundamental molecular interactions such as protein-ligand binding, the starting point for rational design of new drugs.
The simulation of biomolecules, developed for the past 60 years, is based on the iterative solution of ordinary or stochastic differential equation models and is notorious for using tremendous amounts of supercomputer time. When deployed skillfully, these methods are predictive and insightful tools within a virtual laboratory; they are used in the pharmaceutical industry for a broad range of problems such as contributing to SARS-Cov 2 vaccine development by modelling the viral invasion of cells . Molecular simulation can help to explain the processes that underpin antimicrobial resistance  and propose novel candidate drugs for cancer therapy .
Despite the promise of these examples, the currently used methodology of biomolecular modelling is poorly adapted to the types of challenges that arise in modern medicine. There is little integration of experimental and simulated data, which – if exploited further – can lead to much better understanding of biomolecular processes at atomistic resolution. Second, simulation protocols are computationally demanding and struggle to reach biologically relevant timescales–-biological processes occur on millisecond to second timescales. By contrast, a simulation may take many weeks to produce the equivalent of a few microseconds since the methods are forced to simulate at timesteps of 2 fs. This problem means that exploring the full range of accessible states of a molecule at given laboratory conditions is extremely challenging. This is referred to as the sampling problem. Both of these challenges will be possible to address by incorporating machine learning into this modelling process.
The simulation process can be dramatically sped up by using machine learning based approaches, allowing acquisition of information relevant for the experimental scale. For example, it is possible to replace the physics-based model for interaction forces between atoms and atomic groups by machine learnt models for these interactions; once trained such surrogate models can be much more efficient to evaluate and can therefore accelerate the simulation process. The training process needed here is a version of online learning in which the data set is generated from other computations (force field evaluations). Controlling the two-component learning procedure (supervised forcefield model and unsupervised data acquisition process) is a complex task. A second challenge is to generate sufficient samples from the conformational space of the biomolecule of interest to quantitatively estimate the statistical properties of the molecule such as binding affinities, relative “free energies” of different conformational states, or the transition rates between states. Such information can further inform experiments or suggest candidate drugs.
This PhD’s primary focus will be to combine physical modelling techniques and machine learning to address challenges around sampling by learning from experimental and simulation data. The project will draw on preliminary work by Dr Mey and Prof. Leimkuhler using unsupervised learning techniques based on the use of so-called diffusion maps in order to identify coordinates that promote slow conformational exploration and harness these for enhanced exploration of the conformational space (for some related concepts, see ). Working with the industry partner (Dr Cipcigan) we will develop a combination of neural network-based sampling employing experimental constraints (e.g. NMR)  and classical simulations to develop a flexible, extensible simulation framework to overcome sampling challenges . This opens up new ways to design drugs based on dynamics in congruence with NMR data .
Small molecular systems of pharmaceutical relevance, such as cyclic peptides, for example cyclo-(Pro-Ser-leu-Asp-Val) from Kamenik et al 2018 , will provide a good test case through the availability of NMR data. Furthermore, these systems exhibit both slow and fast dynamical behaviour meaning that they present a broad range of challenges that can be tackled sequentially as the methodology is improved. The ultimate goal will be to expand these ideas to larger protein systems with relevance to antimicrobial resistance  or Alzheimer’s disease .
 https://www.nature.com/articles/d41586-020-03348-4, accessed 4/12/2020
 Kulczycka-Mierzejewska, K et al. J Mol Model 24, 191 (2018) DOI
 Jabbarzadeh Kaboli P et al. PLoS One 13 e0193941 (2018) DOI
 Z. Trstanova, B. Leimkuhler and T. Lelievre Proc. Roy. Soc A 476 20190036 (2019) DOI
 Olssen et al. Proc. Nat. Acd. Sci., 114 8265 (2017) DOI
 Simulation and machine learning for future medicine, available at https://research.ibm.com/labs/uk/case-studies/pdf/HartreeCentreCaseStudy-IROR-DrugDiscovery.pdf
 Juárez - Jiménez et al. Chem. Sci. 11 2670 (2020) DOI
 Kamenik, A. S et al. J. Chem. Inf. Mol. 58, 982 (2018) DOI
 Linciano et al. ACS Infect. Dis. 2019, 5, 1, 9–34
 Singh et al. J. Biol. Chem. 295, 5850 (2020) DOI
Applicants need to have a UK 2.1 honours degree, or its international equivalent in an area such as biomedicine, chemistry, computer science, engineering mathematics or physics. Prior coursework in statistical mechanics/statistical physics or theoretical chemistry is highly desirable.
Only UK nationals and EU/EEA/Swiss nationals holding settled or pre-settled status in the UK are eligible for the studentship funding.
CDT studentships fund 4 years of study, covering tuition fees, stipend (£15,285 in 2020/21) and travel/research support.
The CDT programme follows 1+3 format. In Year 1 you will study towards a Master by Research, undertaking a number of taught courses and taster research projects to broaden and refine your skills and explore different research areas. Following that you will begin your PhD project “Getting in Shape: AI driven sampling of the shapes of biomolecules” co-supervised between the University of Edinburgh and IBM Research. For more information see:
Application is open now for admission in September 2021 as part of the main CDT cohort. For fullest consideration, applications should be submitted by 16:00 GMT on Friday 31st January 2021.
If you have already applied for the general admission to the CDT but want to be considered for this project, please email email@example.com.
You will need to submit the following documents with your application. Make sure these are obtained in good time as we cannot consider applications without them.
- Personal statement explaining why you want to be considered for the programme and what you think your major strengths are. Make sure you highlight any unique aspects or experiences that you think are relevant to your application that do not appear in your CV.
- CV which includes your educational history, work experience and any relevant research publications, and highlights any special achievements.
- Research proposal Describe a novel research project relevant to the aims of the CDT of your own design with reference to emerging challenges and methodologies in the current literature. The proposal should answer the following questions: What is the challenge? Why is this important and timely? How do you propose to tackle it (you could include specific technical details here)? What is the current state of the relevant research in this area and what do you propose that is novel? What are the potential societal implications of your proposed research? The research proposal is an important component in our assessment of a candidate’s suitability and aptitude for the CDT programme. Proposals should be 1-2 A4 pages in length in a 10pt font exclusive of references.
- Degree certificate and transcript for both undergraduate and postgraduate studies, if applicable. If your studies are in progress, you will be asked to upload an interim transcript, otherwise your application cannot be considered.
- Proof of English language proficiency. If you don’t have an English language certificate yet, we will still consider your application. However if an offer of admission is made, it will be conditional on you providing an English Language certificate which does meets University requirement.
- 2 reference letters.