Research Project Ideas

This page captures some ideas for interesting future projects, in varying level of detail. Some have been suggested by our industrial partners and some by our researchers.

Please be aware that you are not required to pick one of these projects, they are just some ideas! You might also be interested in applying for one of our industry-sponsored projects.

If you are an industry partner or researcher and would like to add a new suggestion, please get in touch.

Project Ideas Suggested by CDT PPar Industry Partners

Building Trust in Distributed Parallel Systems

Suggested by CriticalBlue

Pervasive Parallelism will often be implemented across the huge numbers of deeply embedded devices which will comprise the Internet of Things. In these systems, concurrency will be distributed over wireless networks which need protection from third party attacks.

This project will focus on investigating how to establish trust within these systems, based upon software attestation with each participant needing to securely identify itself to it’s peers. It will be based on ongoing product development and will extend existing software through research into highly focused areas.

Potential avenues are:

— System security analysis. Evaluation of protocol under development including threat modelling. — Low power software cryptography. Address some of the challenges around cryptography in resource constrained environments. — Protocol development. Extend client server protocol into a peer to peer system. — Alternative Network Protocols. HTTP is the current implementation target but for IoT other protocols are more efficient such as CoAP

The goal of the project will be to provide actionable information back to the development teams and will almost certainly provide proof of concept software backing up any conclusions. We strongly prefer candidates who are keen to actually build solutions.

Contact: ppar-cdt@inf.ed.ac.uk

App Security

Suggested by CriticalBlue

One area of significant activity at the moment is app security. New vulnerabilities are constantly announced and the attention they get in the media is growing, similarly, the number of defence solutions available to software developers is also on the rise. The question for PPar researchers is; can multi-core bring anything extra to the party in this area? Some related research uses automatic software diversity, running on multiple cores, to detect if the software on any individual core has been compromised by some attack. Such an approach is preferable to most solutions currently in the market, in that, the approach attempts to recognise when a system is under attack, rather than rely on and protect against known attack vectors. What is the next evolutionary step for such research? How can we minimise the resource usage of the protection mechanism? What holes remain in, or are introduced by, the protection that is provided? What are the limitations of the current research and how can they be overcome?

Suggested reading:

  • “Orthrus: Efficient Software Integrity Protection on Multi-core” – Ruirui Huang, Daniel Y. Deng, G. Edward Suh, Proc. of the 15th international conference on Architectural Support for Programming Languages and Operating Systems, Pittsburgh, PA, March 2010.
  • Doctoral Dissertation – Multi-variant execution: run-time defense against malicious code injection attacks – University of California at Irvine Irvine, CA, USA ©2009 – ISBN: 978-1-109-20556-5 – http://www.babaks.com/files/babakthesis.pdf)

Contact: ppar-cdt@inf.ed.ac.uk

Program Behaviour Tracing Technologies

Suggested by CriticalBlue

All research that delves into the capture and presentation of program behaviour is of interest to us. It may focus on a single process, work across multiple processes, target the interaction with the OS, or some combination of the three. Two specific interests we have at present are included below:

a) Facilities for tracking data could be improved. Of particular interest would be a mechanism for tracking data as it enters a linux container, where it is used (by the processes active in the container) and ultimately where it ends up. Taint tracking is one mechanism that may be used, though it has a high overhead.

b) Another interest is the presentation of tracing information. It is so easy to be overwhelmed by data when tracking the control flow and/or data flow in programs. New ways of presenting program behaviour in intuitive ways, at various levels of abstraction and permitting navigation between different levels of detail is a permanent challenge for us. This is particularly true when behaviour is to be captured from various interacting processes and/or threads that make up the system of interest.

Contact: ppar-cdt@inf.ed.ac.uk

Superoptimization Using LLVM

Suggested by CriticalBlue

We are interested in machine learning based, binary-to-binary, profile guided superoptimization and would be particularly enthusiastic if there was a research effort in this area using LLVM. The goal for this would be to demonstrate a superoptimizer that took a binary (and any associated DSOs) with its profile and used that to generate a new binary where hot sequences had been superoptimized (including transformations typically associated with LTO). There is a good deal of literature on the different compiler optimization spaces, but not as much work has been done on the combined space.

Contact: ppar-cdt@inf.ed.ac.uk

Project Ideas Suggested by CDT PPar Supervisors

Learning Explainable Representations via Statistical Relational Learning

Suggested by Vaishak Belle

Numerous applications in robotics and data science require us to parse unstructured data in an automated fashion. However, many of these models are not human-interpretable. Given the increasing focus on the needed Explainable machine learning and AI algorithms, this project looks to interface general-purpose human-readable representations (databases, programs, logics) with cutting-edge machine learning.

Contact: vaishak@ed.ac.uk

Probabilistic programming 

Suggested by Vaishak Belle

Probabilistic reasoning and machine learning over multi-modal data is a deeply challenging problem: programs can enable modularity and descriptive clarity to build complex machine learning applications. This area look at new methods for inferring and learning probabilistic programs.

Contact: vaishak@ed.ac.uk

Automated Solving of Science Problems

Suggested by Vaishak Belle

Since the early days of AI, people have been fascinated by machines solving puzzles, exercises and exams that are used to test the intelligence of humans. The ability to solve such problems is an important cognitive and intellectual skill as it is evaluated as part of academic admission tests such as SAT, GMAT and GRE. Recently, there has been a surge of interest in mathematical and scientific problem solving. This area looks to expand the frontiers of this literature, by combining ideas from Bayesian Machine learning, NLP, and probabilistic programming.

Contact: vaishak@ed.ac.uk

Epistemic Planning and multi-agent systems 

Suggested by Vaishak Belle

Classical automated planning has traditionally focused on achieving a goal state given a sequence of actions. In numerous AI applications involving human-machine interaction, the underlying model may need to reason and learn wrt beliefs and intentions of other agents, that is, it needs to have a mental model for the other systems in the environment. This area looks at automated planning in this richer context, by building on ideas from modal logic and reasoning about knowledge.

Contact: vaishak@ed.ac.uk

Efficient Analysis for Parametric Dataflow Models

Suggested by Bruno Bodin

Dataflow modeling is commonly used to express streaming applications (such as software radio or video processing) and is a crucial tool to program heterogeneous many-cores in embedded systems. The most common of these models are the Synchronous dataflow (SDF) and the Cyclo-static Dataflow (CSDF). In order to take advantage of these models, several problems must be solved during the compilation process including throughput evaluation, buffer sizing and hardware mapping. When exact methods are not efficient enough to analysis these models, approximate techniques exist. But to meet the requirements of recent applications, dataflow models have been extended to their parametric form, for which there is no efficient techniques available. The aim of this project is to provide novel techniques to analysis parametric dataflows in a reasonable amount of time.

Contact: bbodin@inf.ed.ac.uk

Automated Parallelisation of Functional Programs

Suggested by Alan Bundy

This builds on the PhD work of Murray Cole, in which he developed algorithmic skeletons, and my research on reasoning about functional programs. Murray observed that common patterns of parallel programs corresponded to second-order functions in functional programming, e.g., the farm skeleton corresponds to the maplist function, which takes a function and a list and returns a list in which the function has been applied to each element of the input list. Andrew Ireland and Greg Michaelson used these ideas to convert sequential programs represented as first-order recursive functions into parallel programs represented with these second-order functions. Inductive theorem proving is used to prove these two representations equivalent. This project will build on and extend these ideas. In particular, we will look at the automated synthesis of these second-order (parallel) programs using the first-order (sequential) ones as specifications.

Contact: A.Bundy@ed.ac.uk

Patterns and Skeletons in Parallel Programming

Suggested by Murray Cole

The skeletal approach to parallel programming advocates the use of program forming constructs which abstract commonly occurring patterns of parallel computation and interaction.

Many parallel programs can be expressed as instances of more generic patterns of parallelism, such as pipelines, stencils, wavefronts and divide-and-conquer. In our work we call these patterns “skeletons”. Providing a skeleton API simplifies programming: the programmer only has to write code which customizes selected skeletons to the application. This also makes the resulting programs more performance portable: the compiler and/or run-time can exploit structural information provided by the skeleton to choose the best implementation strategy for a range of underlying architectures, from GPU, through manycore, and on to large heterogeneous clusters.

Opportunities for research in this area include the full integration of skeletons into the language and compilation process, dynamic optimization of skeletons for diverse heterogeneous systems, the extension of skeleton approaches to applications which are “not quite” skeleton instances, the automatic discovery of new (and old) skeletons in existing applications, and the design and implementation of skeleton languages in domain-specific contexts.

Contact: mic@inf.ed.ac.uk

Parallelism Discovery and Data-Centric Parallelisation

Suggested by Bjoern Franke

Rewriting sequential legacy software for multi- and many-core systems is extremely challenging, but increasingly important. Without a solution we cannot take advantage of the compute power and energy efficiency offered by these devices - energy and time will be wasted, and progress in program efficiency will be slow or stop. Automatic parallelisation, where a sequential program is automatically rewritten and parallelised by a software tool offers a solution. However, despite 40+ years of research, automatic parallelisation has failed to deliver its promise. This is because of two systematic problems inherent to all prior approaches: (1) Parallelisation is code-centric, i.e. traditional compilers are exclusively concerned with parallel code restructuring, but do not make an attempt to restructure data organisation. And, (2), parallelising compilers operate on loops in isolation and do not consider whole program context. We believe together these are fatal flaws in the design of every single parallelising compiler developed so far. In this proposal we fundamentally rethink how (semi-)automatic parallelisation is approached. We treat data structures and their uses across the whole program as a first class citizens. This enables the compiler to understand how data is organised and accessed. Capturing the programmer's intent creates the opportunity to restructure data organisation and expose substantially more parallelism, where premature low-level optimisation, use of dated language features and poor programming practices have impeded conventional program analysis. We build novel data-centric parallelisation tools, which will parallelise legacy applications as never before and generate rejuvenated, easy-to-maintain code.

Contact: bfranke@inf.ed.ac.uk

Scalable Full-System Simulation from IOT to Data Centres

Suggested by Bjoern Franke

Mainstream computing has shifted away from largely standardised desktop or laptop devices. Instead, end users of computing technology now routinely rely on mobile devices with integrated accelerators and backed up by cloud based computing resources. In the near future users will experience another new technology, the Internet of Things, which will see billions of networked devices enabling smart environments, e.g. smart cities, homes, mobility and many more.

While the possibilities appear endless, this new diversity of computing devices and the scale of their possible interactions create major challenges to the designers of these new computing platforms. Computer architects can select from a large number of readily available IP cores, many of these offering hundreds of configuration options affecting performance, chip area, and power consumption. In addition, processor designers can now integrate – or design their own – highly optimised domain- or application-specific accelerators, e.g. for graphics, vision, signal processing, encryption and other compute- intensive tasks. Alternatively, demanding tasks can be seemlessly offloaded to the cloud, i.e. a data centre providing a transparent service to end users. Identifying an optimal configuration in this vast design space, which meets performance, cost and power constraints is a non-trivial task and heavily relies on simulation.

While single-core processor simulation technology has seen steady, incremental improvements over the years, platform diversity and scale create new challenges: the complexity of simulation targets has outgrown our simulation capabilities. We want to simulate new kinds of accelerators, e.g. for machine learning or computer vision, and their interaction with other system components on today’s machines, which do not provide the same acceleration capabilities as the machines we wish to design. We also want to simulate the next generation of energy-efficient data centres with millions of machines connected by a complex network infrastructure. Finally, we want to simulate emerging IoT platforms and applications, which may involve hundreds of millions of interacting devices. Using today’s simulation technology we are already forced to cut corners and sacrifice simulation detail and accuracy, resulting in less-than-optimal exploration of the design space. Without a solution to the large-scale, full-system simulation challenges industry is facing we will soon encounter a situation where simulation technology is failing to support the design of new computing devices; energy and time will be wasted, and progress in computing architecture efficiency will be slow or stop.

Performance Portability for Sound Synthesis

Suggested by Alan Gray

It is becoming increasingly difficult to write efficient HPC applications because hardware is becoming increasingly complex. For example, many systems now feature combinations of multiple interconnected CPUs and GPUs, each with their own internal parallelism and memory complexities. In-depth computational expertise is required to navigate the many low-level implementational choices. The final code is often complex, non-portable and far-removed from the original problem. Techniques currently being pioneered by Christophe Dubach and his collaborators from the School of Informatics focus on enabling problems to be expressed intuitively using information on the characteristic parallel patterns. This information can then inform the lower-level system software layers such that they are able to automatically generate code to efficiently target different architectures.

For more information see EPCC website.

Contact: a.gray@epcc.ed.ac.uk

Event Semantics for Quantum Computing

Suggested by Chris Heunen

Large scale deployment of quantum computing will require firm compositional semantics, to easily build larger quantum programs from smaller ones, and to be sure that they indeed do what you want them to. Traditional forms of semantics will need to be adapted to the quantum setting because of quantum (in)compatibility: in contrast to classical computing, there are pairs of questions that quantum theory cannot both answer sensibly. Game semantics are traditionally attractive because they include both algorithmic and structural aspects. They can be phrased in terms of event structures, which explicitly include a notion of (in)compatibility between events, originally used to model concurrent computing. We will develop quantum event structures, study their formal properties and their relationship to concurrent computing and probabilistic computing, and consider their computational power. We may also explore links to quantum programming languages, to quantum domain theory, to quantum category theory, to foundational questions in physics, and to databases.

This project would be suitable for a student with a background in at least one of quantum computing, category theory, or domain theory, and a willingness to learn some advanced mathematics.

Contact: chris.heunen@ed.ac.uk

Quantitative Modelling Process Algebra

Suggested by Jane Hillston

Stochastic process algebras, providing high-level descriptions of dynamic systems, amenable to rigorous mathematical analysis through formal semantics.

The PEPA project began in Edinburgh in 1991 and has developed a modelling language and associated tools to predict the performance of software and hardware systems. The PEPA language (Performance Evaluation Process Algebra) is a compact modelling language which models systems as compositions of sequential components which performance timed activities either individually or in cooperation with other components. PEPA models are analysed by compilation into Continuous-time Markov Chains (CTMCs) or other mathematical structures. Research projects on the language include extending the software tools which support it, and improving their analysis capabilities, and applying the language to modelling real-world performance problems in hardware and software systems. For example, two topics of particular interest currently are development of sophisticated techniques for interrogating models and guiding analysis to explore scientific questions about the system, and analysis techniques to investigate the power-performance trade-off in large, flexible systems such as those found in cloud computing.

Bio-PEPA is a closely related which has been specifically designed for modelling biochemical reactions, motivated by problems in systems biology. It has been used to model a variety of intracellular biological processes including circadian rhythms in plants and algi, and rRNA synthesis. However, it has also found applications in other domains such as crowd dynamics. Unlike PEPA, Bio-PEPA models adaptive behaviour through functional rates and incorporates representation of the spatial organisation of a system and how it impacts the dynamics of behaviour. Research projects on Bio-PEPA include extending the language to have a better representation of binding sites, investigating the analysis and interrogation techniques which can be applied to models, and developing verification techniques to ensure the quality of models, particularly for users who are not familiar with formal languages. Here again there is scope for research on the efficient implementation of tools, particularly for example on visualisation of results, and on case studies exploring the capabilities of the language.

Contact: jeh@inf.ed.ac.uk

Planning for Performance

Suggested by Daniel Holmes

Many high performance computing (HPC) applications use an iterative algorithm to approximate a numerical solution to scientific equations that cannot be solved analytically. The repetitive operations required by these iterative algorithms lend themselves to planning to optimise performance. Planning can involve reserving hardware resources, reconfiguring hardware resources, selecting from multiple software implementations, and many other innovative ideas. This is a very general technique, but of particular interest and focus in this project is planning communication operations for large-scale machines. The Message Passing Interface (MPI) is the de facto standard communication mechanism for HPC machines but it only has very limited support for planning (called persistent operations in MPI). This project will lead to significant changes in the MPI Standard and provide communication operations that will improve the computational performance of large scale scientific applications

Contact: d.holmes@epcc.ed.ac.uk

Next Generation Network Functions

Suggested by Myungjin Lee

Modern networks support many different network functions ranging from relatively simple address translation to complex intrusion detection. As the network scales, traffic demand grows, new services emerge and their service requirements become more stringent, the need for innovations in the network function area is also escalating. In fact, technologies such as server virtualisation and software-defined networking (SDN) are key enablers of innovations in the area, leading to a paradigm shift to network function virtualisation (NFV). Through active exploitation and extension of these technologies, the high-level goal of this project is to enrich a future NFV ecosystem. Some of key objectives are to design and implement high-performance virtualised network function architectures and scalable network function orchestration systems.

Contact: myungjin.lee@ed.ac.uk

Design, Analysis and Optimisation of Next-Generation Mobile Cellular Networks

Suggested by Mahesh Marina

Mobile cellular networks are witnessing dramatic changes owing to the exponential increase in mobile data usage and the emergence of devices like smartphones and tablets in recent years. At the same time, mobile network operators are faced with declining average revenue per user (ARPU) in view of the greater competition as well as the increasing infrastructure investments to keep up with the rising demand for high-performance mobile data networks. This has given rise to several new approaches that are currently under investigation to better cope with mismatch between user demand and ARPU trends. Embracing the heterogeneity in the cellular network infrastructure with the inclusion of consumer and third-party owned base stations (femto cells and pico cells) is one such approach towards denser infrastructure at lower cost to the operators. Re-architecting radio access networks (RANs) to make base stations low cost, low power and easy to deploy by centralising as much of the base station processing is another approach, sometimes referred to as cloud RANs. Yet another approach involves offloading traffic at times of peak load to other co-located wireless networks (e.g., WiFi) and the related concept of opportunistic secondary use of other licensed spectrum (e.g., TV white spaces) via cognitive radios. There is also much scope for innovation through network monitoring and analysis for service and infrastructure optimisation to generate new revenue streams and reduce mobile network operating expenditure (OPEX) via self-organising network (SON) functionalities, respectively. The aim of this project is to investigate cutting-edge research issues concerning network architectures, performance analysis, optimisations, holistic resource management protocols and algorithms (including interference coordination) within the context of the aforementioned approaches. Exploring the benefit of sofware-defined radios (SDRs) and networking platforms in addressing these issues is also within the scope of this work.

Contact: mahesh@ed.ac.uk

Memory Consistency Models and Cache Coherency for Parallel Architectures

Suggested by Vijay Nagarajan

Parallel architectures (e.g. multicores, manycores and GPUs) are here. Since performance on parallel architectures is contingent on programmers writing parallel software, it is crucial that the parallel architectures are “programmable”. The memory consistency model which essentially specifies what a memory read can return is at the heart of concurrency semantics. The cache coherency sub-system which provides a view of a coherent shared memory is at the heart of shared memory programming. The goal of this project is to design and implement memory consistency models and cache coherence subsystem for future parallel architectures.

Contact: vnagaraj@inf.ed.ac.uk

Auto-Parallelisation

Suggested by Michael O’Boyle & Bjoern Franke

The aim of this project is to develop advanced compiler technology that can take emerging applications and automatically map them on to the next generation multi-core processors. This PhD will involve new research into discovering parallelism within multimedia and streaming applications going beyond standard data parallel analysis. The project will also investigate cost-effective mapping of parallelism to processors which may include dynamic or adaptive compilation.

Contact: bfranke@inf.ed.ac.uk or mob@inf.ed.ac.uk

Compilers that Learn to Optimise

Suggested by Michael O’Boyle

Develop a compiler framework that can automatically learn how to optimise programs. Rather than hard-coding a compiler strategy for each platform, we aim to develop a novel portable compiler approach that can automatically tune itself to any fixed hardware and can improve its performance over time. This is achieved by employing machine learning approaches to optimisation, where the machine learning algorithm first learns the optimisation space and then automatically derives a compilation strategy that attempts to generate the “best” optimised version of any user program. Such an approach, if successful, will have a wide range of applications. It will allow portability and performance of compilers across platforms, eliminating the human compiler-development bottleneck.

Contact: mob@inf.ed.ac.uk

Heterogeneity: Discovery, Mapping and Potential

Suggested by Michael O’Boyle

Power density has been the driving force behind multi-core processors and, due to dark silicon, is the reason for the emergence of heterogeneous systems. Such systems consist of specialised processors which have the potential for increased performance within an energy budget for certain tasks. The best examples of such systems are MPSoCs found in mobile devices and GPGPUs/FPGAs found in HPC/data centres This project is interested in discovering potential heterogeneous code in existing programs and mapping it to current and future heterogeneous devices. There are many potential projects in this area. One direction may be to explore how static and dynamic analysis can be used to determine candidates before applying program transformations to align code and device based on cost. Another direction may be statistical analysis of large program repositories to determine potential new heterogeneous devices.

Contact: mob@inf.ed.ac.uk

Complexity Metrics for Testing Concurrent Programs

Suggested by Ajitha Rajan

Developing concurrent programs to a high standard of reliability is notoriously difficult due to the large number of interactions between components executing in parallel.

Testing and automated source code analysis aid the developer in finding bugs and ensuring correctness. However, it is infeasible to analyse the entire program in detail since even for small concurrent programs the number of interactions and execution paths quickly become intractable. It, therefore, becomes important to identify parts of the program that are especially error prone, and focus testing and analysis efforts on these parts. For sequential programs, software complexity metrics help identify the error prone parts. For concurrent software, this problem has not received enough attention.

In this project we will first devise suitable complexity metrics for concurrent programs, and then apply them to increase the efficiency of automated software analysis and testing.

Contact: arajan@inf.ed.ac.uk

Distributed Algorithms and Protocols for Mobile and Wireless Networks

Suggested by Rik Sarkar

The goal is to develop algorithms and protocols for processing data inside a network, and answering questions about this data. This is useful in sensor and mobile networks where local computation abilities can be used to make the system more efficient and reduce communication. Mobile networks have additional properties such as moving nodes and GPS capabilities, which create for us additional challenges as well as opportunities in protocol design. These topics can be explored both analytically through algorithmics and experimentally through simulations and implementations.

Contact: rsarkar@inf.ed.ac.uk

Concurrent Bidirectional Transformations

Suggested by Perdita Stevens

Bidirectional transformations (bx) encode and maintain consistency between multiple information sources, which might be, for example, models of different aspects of a software system under development. The idea has its roots in the view-update problem in databases, and is of increasing interest thanks to the rise of model-driven development. Yet languages for bx, and technologies for executing them, still leave much to be desired, limiting uptake.

This project will investigate how bx can be parallelised and the implications of so doing. There is scope for taking the project in a more formal or a more practical direction. Questions include: do we need new languages, or can we have a parallel implementation of an existing bx language? How do we handle dependencies between parts of the models? What if a model is modified while a bx is being executed on it? What formal assumptions do we need to make, and what can we guarantee?

Contact: perdita@inf.ed.ac.uk

Dynamic Binary Translation for VLIW and SIMD Target ISA

Suggested by Nigel Topham

Modern embedded processors offer SIMD extensions and support for VLIW encodings as configurable options to accelerate compute and data intensive application domains (e.g. computer vision). SIMD and VLIW present a set of unique challenges to a JIT compiler of an instruction set simulator.

The main goals of this project are to explore, quantify, and solve those challenges by demonstration through practical implementation of a JIT compiler based on the LLVM framework, targeting a SIMD and VLIW architecture in a product-quality simulator implemented in C++.

Key Milestones
  • Prototype and evaluate the best design of a dynamic compiler targeting a SIMD/VLIW architecture and create a baseline implementation that functions in a concurrent parallel JIT compilation infrastructure.
  • Find suitable sets of benchmarks to evaluate the runtime performance and memory overheads of the proposed baseline implementation.
  • Instrument, measure, and find deficiencies in the baseline dynamic compiler. Devise and implement optimizations in both the dynamic compiler and runtime system to optimize for the common case where performance issues have been measured.
Research Challenges
  • How can runtime information and speculation be exploited to drive optimizations in the JIT compiler? E.g. modern vector instruction sets support predication, is it possible to speculate given runtime information that all elements of a predicated vector instruction will execute, therefore omitting to check predicates at runtime?
  • Is it possible to exploit the simulation host to take advantage of its VLIW/SIMD capabilities by mapping simulated vector instructions on host vector instructions?
  • How can various execution modes that change the semantics of whole blocks of code be supported efficiently once a specialized version of code has compiled by the JIT compiler?
  • Is it sensible to parallelize execution of individual slots of a VLIW instruction or are the overheads of parallelization not worth it, i.e. the parallelism is not coarse grained enough?
Target Outcome
  • Improved simulation performance gains over interpretive simulation in general.
  • As a concrete example it should be possible to simulate complex applications in the vision domain in real time using this technology.

Contact: npt@inf.ed.ac.uk

Dynamic Binary Translation in the Context of Concurrent Multi-Core OS Simulation

Suggested by Nigel Topham

To enable scalable simulation of multi-core embedded processor targets it is imperative to parallelize the simulation of cores on the host platform. This is usually done using shared memory concurrency (e.g. threads) so each simulated core executes in its own thread. To gain further speed ups frequently executed code is dynamically compiled at runtime. Such an execution environment poses a set of unique challenges (e.g. concurrency, latency, runtime/memory overhead) to the JIT compilation subsystem integrated in such a simulation infrastructure.

The main goals of this PhD project are to explore, quantify, and solve those challenges by demonstration through practical implementation of a JIT compiler based on the LLVM framework, in the context of a product quality simulator implemented in C++ capable of concurrent multi-core OS simulation.

Key Milestones
  • Prototype and evaluate the best design of a dynamic compiler targeting a concurrent simulation architecture and create a baseline implementation that functions in a concurrent parallel JIT compilation infrastructure.
  • Find suitable sets of benchmarks to run in a simulated multi-core OS to evaluate the runtime and memory overheads of the proposed baseline implementation.
  • Instrument, measure, and find deficiencies in the baseline dynamic compiler. Devise and implement optimizations in both the dynamic compiler and runtime system to optimize for the common case where performance issues have been measured.
Research Challenges
  • How can runtime information and speculation be exploited to drive optimizations in the JIT compiler? Can we share dynamically compiled code and collected profiles? If so how do we deal with concurrency issues?
  • Self-modifying code occurs frequently during OS simulation as the OS maps programs to memory, executes them, and un-maps/overwrites those memory locations after a program terminated. How can this case be detected and handled efficiently?
  • A key challenge to solve during OS simulation is the modelling and support for address translation. Some optimizations in this space such as translation caching are established solutions by now in both research and industry. However, they create some follow up problems with respect to translation cache maintenance that need to be evaluated in terms of frequency and latency to enable the best possible performance for OS simulation.
  • How can the modelling of functional caches and cache coherency be supported in such a simulation environment?
  • When modelling functional caches how should one deal with instruction cache invalidations if any code that resides in the cache has been JIT compiled?
Target Outcomes
  • Improved simulation performance gains over interpretive simulation in general.
  • As a concrete example it should be possible to simulate a quad- or eight-core OS in real time executing parallel benchmark suites such as EEMBC MultiBench, Splash2, etc.

Contact: npt@inf.ed.ac.uk

Co-design of DSP Cores and Compilers

Suggested by Nigel Topham

Designing compilers to compile code efficiently onto DSP cores is an ongoing challenge, particularly if the processor has external hardware acceleration. Most compilers require the use of machine-specific directives or intrinsics in order to optimally compile for accelerators, which means that porting code from one DSP core to another is very time consuming. The task of scheduling software is also made more difficult due to the latency of hardware accelerators. This PhD project will first focus on the co-optimisation of the DSP core and compiler technology to allow high-level C to be used. The project will then look at application-specific methods to optimise accelerator hardware in terms of execution speed, power consumption and silicon area.

Contact: npt@inf.ed.ac.uk

Reconfigurable Data-Parallel Structures for Embedded Computation

Suggested by Nigel Topham

Search for reconfigurable structures that do not rely on relatively inefficient Field Programmable Gate Arrays. One of the defining characteristics of computationally-demanding embedded computations, such as high definition video codecs, is the large quantities of data parallelism in many of their algorithms. The aim of this project is to explore the characteristics of a range of data-parallel algorithms, from typical embedded computations, and identify new and novel ways to provide reconfigurable micro- architectural structures to exploit the available data parallelism.

Contact: npt@inf.ed.ac.uk

From Data Types to Session Types: A Basis for Concurrency and Distribution (ABCD)

Suggested by Philip Wadler

Concurrency and distribution are computing’s most pressing problem. The data type is one of computing’s most successful ideas. Session types codify the structure of communication, making software more reliable and easier to construct. Edinburgh, joint with Glasgow and Imperial, has an EPSRC Programme Grant to explore session types. For more information, please see the ABCD Project website.

Contact: wadler@inf.ed.ac.uk