Industrial Engagement

Examples of engagement between the CDT in Pervasive Parallelism and its industry partners

CDT Student Presentations at Company Offices

Presentation at MathWorks – 27 May 2016, Cambridge
Vanya Yaneva: Accelerating Software Testing Using GPUs

In software engineering, careful validation of the developed system is a crucial task, which involves the generation and execution of a large number of test cases. Thus, rigorous testing of any non-trivial system could be extremely time consuming, putting an enormous strain on the software development cycle. One way to accelerate software testing is to execute test cases in parallel on the GPU threads. In this presentation, I am going to present this idea, the challenges of implementing it and how I am addressing them.

Presentations at Amazon Development Centre Scotland – 19 May 2016, Edinburgh
Chris Cummins: All the OpenCL on GitHub

I’ll be presenting work-in-progress research toward a novel approach for generating benchmark programs. By mining a large corpus of publicly available source code from GitHub, a deep learning neural network is trained to learn the distribution of program code at the character-sequence level. This learned distribution can be used to provide some measure of the ‘humanness’ of a given source code, with immediate applications for verifying the representativeness of benchmark suites against ‘real world’ programs. However, if we instead sample from this learned distribution, we can generate entirely new character sequences with the eventual goal of creating compilable and executable programs. If successful, this approach could have far reaching implications for compiler testing, optimisations and generative programming.

Adam Harries: Sparse and irregular parallel processing on GPUs using functional languages

GPU parallelism presents a tempting resource for accelerating single-node parallel workloads: with the promise of thousands of threads, highbandwidth memory and a low power to performance ratio, porting parallel code to run on a GPU seems like an obvious choice to make. Contemporary GPU architectures, however, are heavily biased towards the execution of predictable, regular data parallelism, which contrasts with many real application domains which are naturally inhabited by sparse and irregular datasets and algorithms.

In this talk I will introduce some of our work on implementing graph algorithms, a classic sparse and irregular domain, on GPUs using a functional domain specific language. I will introduce our language, how we can use it to generate high efficiency GPU code, and how we extended it to support sparse computations without sacrificing safety or performance.

Simon Fowler: Types for Protocols

Data types have become a fundamental part of programming languages. Statically-typed programming languages have a number of benefits, in particular detecting more application errors at compile time, and guiding the development of abstractions to better structure applications.When programming highly-concurrent and distributed systems, communication and concurrency become central to language design. A natural question to ask would be “Is there an analogue of the data type for communication-centric programming?”

Session types are a type discipline for communication-based programming: protocols are written as session types, meaning that implementation of a communicating program can be statically checked to check whether it corresponds to a protocol. In this talk, I’ll give an introduction to session types and a high-level overview of the state of the art in the field.

Presentations at CriticalBlue – 26 April – 14 June 2016, Edinburgh
Floyd Chitalu: Real-time Data-Parallel Collision Processing for Volumetric Objects

I will be presenting my work on investigating the use of GPUs in real-time simulation and visualization for Computer Animation. More specifically I will be focusing on tree-based continuous collision processing. I will also show some results obtained so far demonstrating successfully parallelized simultaneous traversal and refitting of multiple Bounding Volume Hierarchies (BVH).

Philip Ginsbach: Enabling Compilers to Exploit Heterogeneous Computing

Heterogeneous computing is becoming increasingly important for exhausting the potential of current hardware platforms, but established programming models using OpenCL and CUDA require much technical expertise. Specialized libraries such as cuBLAS or cuFFT expose more concrete functionality and are well optimised by expert programmers but are each limited in scope. We present an approach that will enable compilers to automatically spot functionality that suits particular interfaces and then substitute it with library calls.

Daniel Hillerström: Programming with Algebraic Effects and Handlers

Plotkin and Power’s algebraic effects combined with Plotkin and Pretnar’s effect handlers provide a compelling alternative to monads as a basis for effectful programming. The approach is inherently modular as effectful operations are kept abstract for handlers to instantiate them with concrete implementations. During this talk I will give a gentle introduction to programming with effect handlers, in particular, I will demonstrate their use through a collection of examples.

Stan Manilov: Automatic Detection of Parallel Code: Dependencies and Beyond
Automatic parallelisation is an old research topic, but unfortunately, it has always been over-promising and under-performing. In this talk, we’ll look at four main approaches towards automatically detecting parallelism in legacy sequential code and we’ll follow with some fresh ideas we’re working on, aiming to bring us beyond the ubiquitous dependence analysis.
Artemiy Margaritov: QoS-Aware Multithreading for Scale-Out Servers
Server processor under-utilisation is one of the primary sources of inefficiency in today’s cloud computing. Recent work has shown that average server processor utilisation in data centres is about 50% while the remainder of the potential output is not productively used. The resulting energy waste stemming from under-utilisation of processors in data centres calls for a solution that optimises consumption of processor resources and improves energy efficiency.
One effective approach for increasing utilisation of a server processor is simultaneous multithreading (SMT). However, SMT is likely to cause contention for shared pipeline resources between threads. Consequently, the contention can lead to much lower performance than that achieved when a thread runs alone. If a significant performance degradation happens to a thread of a latency sensitive user-interactive application, it can violate quality of service (QoS) requirements, making consumers unhappy.
In this talk, I am going to give an overview of my current research in this area. I am developing a microarchitecture mechanism for thread co-location of low-priority applications with latency sensitive applications while maximising processor utilisation. The mechanism should guarantee the preservation of QoS requirements of latency sensitive applications by enforcing relative QoS priorities when assigning threads to processor resources.
Larisa Stoltzfus: Performance, Portability and Productivity for Room Acoustics Codes

The parallel programming landscape is becoming so vast and complicated that developing codes which are performant, portable and easily programmable tends to require a compromise somewhere. Parallel abstraction layers which can be “mapped” to different hardware offer one possibility of de-coupling the need for parallel programming expertise from simulation modelling. A wide range of these types of solutions exist (including skeletons, code generators and low-level libraries), however many of them are still in early stages of developments or have been tested primarily on simple benchmarks.

My project involves exploring ways of simplifying the process of writing performance-portable and productive code for room acoustics simulations in three ways: by investigating how well different implementations of simple room benchmarks perform across different platforms tuned with different optimisations, comparing these simple benchmarks to more advanced room codes and investigating the limitations and feasibility of current higher level frameworks. From this initial study, the intention is then to ascertain if and where current parallel frameworks need more functionality for room acoustics codes and further the develop parallelisable stencil abstractions to be able to fit a wider range of physical codes.

Amna Shahab: What is the Future of Memory System in Servers?

Memory is a costly commodity for data-centre servers with the memory subsystem accounting for a significant part of the system power and cost budget. With DRAM technology scaling failing to keep up with the expected pace, there is a need for further exploration in alternative memory technologies. The technology trends of the emerging memories such as 3D Xpoint, HP’s Memristors, etc favor movement towards a hybrid memory subsystem. Data placement and management techniques tailored to the software behaviour of the workloads are key to formulating an effective design.

Industry Partner Presentations for the CDT

Codeplay Software: Ruymán Reyes: From SYCL to C++17 and Beyond

8 June 2016, PPar Seminar

Heterogeneous computing is becoming more popular, especially in the area of embedded and mobile computing systems. However the programming languages and tools are not up to the task, and developers are faced with legacy code that has to be ported to new architectures.

The successful C++ language has been evolving in recent years and has tried to add more and more modern features. In the next standard, C++ 17, a parallel implementation of the STL is due to become part of the language. The C++ Standard Study Group SG14 is keeping track of developments relating to heterogeneous computing, and trying to bring up a proposal on heterogeneous programming for a technical specification of C++20.

The Khronos Group has been working on SYCL, a runtime and compiler approach that brings single-source and task handling for OpenCL to C++.

In this talk we present SYCL and how Parallel STL can be implemented on SYCL to ease the programming of heterogeneous architectures, and we’ll present the current status of heterogeneous computing support in the C++ standard.

Disney Research (Kenny Mitchell, Bochang Moon, Babis Koniaris): Accelerating Film and Game Technology Convergence

29 June 2016, PPar Lunch

Our talk will present our latest results on computer graphics rendering for games and film convergence. The IRIDiuM (Interactive Rendered Immersive Deep Media) virtual reality installation presents a new form of rendered 360 stereo movies allowing for motion parallax and interactivity. Our sparse sensor body tracking provides user immersion and embodiment with freedom of movement within an interactive 3D panoramic movie. Further, in this talk, we present our recent filtering methods that reduce noise in rendered images while preserving high-frequency edge details. Our methods approximate rendered images with polynomial functions locally, and optimize filtering parameters used in the functions so that filtering errors are minimized. Our polynomial function allows for reconstructing an image block instead of each pixel, and thus we can reduce our optimization based filtering time by parallelizing our reconstruction at only a sparse number of image pixels.

Codeplay Software (Ruymán Reyes): SYCL: Programming accelerators the C++ way

11 May 2016, PPar Lunch

Current technology trends show a major shift in computer system architectures towards heterogeneous systems that combine multiple different processors (CPUs, GPUs, FPGA…) that all work together, performing many different kinds of tasks in parallel. Developers want to take advantage of parallel programming features in modern languages in a simpler and more accessible performance portable way.

One solution to this provided by SYCL. SYCL is a Khronos standard that offers a layer on top of OpenCL that enables programming heterogeneous platforms using a “single-source” style with C++. SYCL has no extensions over standard C++, and supports any host compiler coupled with a SYCL-enabled device compiler to generate the target binary code. In this talk we introduce the basics of SYCL showing various code samples, and present how SYCL enables the implementation of the C++17 Parallel STL for GPUs.

Oracle Labs (Tim Harris): Do Not Believe Everything You Read in the Papers 

7 March 2016, PPar Seminar

Evaluating parallel algorithms is difficult: there are lots of possible metrics, lots of possible ways to run experiments, and lots of ways in which low-level details of the hardware can have unexpectedly large impacts on performance. Tim is going to talk about some of the ways in which he has been bitten by these problems in the past, and then some of the techniques he uses to try to organize his own experimental work.

Cray EMEA Research Lab (Adrian Tate)Cray’s Research Activities in Data and Memory Hierarchy

23rd Mar 2016, PPar Seminar

Cray is a world leader in the production of supercomputers and their software environments. The latter is becoming increasingly important due to the convergence of many physical and technical factors, which will briefly be described in this talk. In recognition of the importance of software advances, Cray has recently launched Cray EMEA Research Lab (CERL), which focuses on deeper technical R&D efforts in software. The mission of CERL is to make complex, massively parallel and heterogeneous supercomputers more performant and simpler to use through deep collaboration with customers and other partners in Europe. This includes the usage of increasingly complex memory hierarchies, the integration of non-traditional workloads (e.g. Viz, Analytics) and through awareness / management of data at many levels. This talk will describe research efforts ongoing in these areas, and will describe in detail our efforts to add more data-awareness to the HPC programming environment in the Eiger project.

CriticalBlue (Barry O’Rourke): Realizing Performance on Mass Market Multicore Platforms

16 March 2016, PPar Lunch

Many of the computer based products shipping today contain multiple processors. Often this is driven by the silicon providers who only offer multi core devices but it is also common to see heterogeneous systems with specialist processing engines next to application processors. These are complex systems running even more complex software and design teams often find themselves struggling to realize the performance promised by the data sheets.

In this talk we will take an in depth look at a real life case study which will provide an insight into the technical challenges facing the embedded software industry as it tries to take advantage of ever increasing parallelism.

Student Internships with Industry Partner Companies

For more details, see the page about Our Students’ Internships.

Recruitment of our Graduates by Industry Partners

See Alumni Page for further details

Industry Partner Representation at CDT Events

3DT and Friends Event – 26 May 2016

Representatives from ARM, Codeplay Software, Cray UK, NVIDIA and Oracle Labs attended the student conference, heard presentations about CDT students’ work, talked with students about their work at the poster sessions, and networked during the breaks and reception.

Creativity Sandpit with the CDT HiPEDS and CDT AIMS – 26-28 April 2016

Nathan Chong from ARM presented a challenge for the students to solve; namely, the application space opening up in the Internet-of-Things era, and the ethical, legal and social issues which arise in addition to the technical challenges. Nathan mentored teams of students as they worked to find solutions to the challenge.

CDT PPar & EPCC Industry Engagement Event: Internships & Project Placements – 4 Nov 2015

Representatives from the following companies attended: Amazon, ARM, Aviagen, Codeplay Software, Contemplate, Cray UK, CriticalBlue, DELL Secureworks, Disney Research, Intel, Keysight Technologies, Mallzee, Optos, Oracle Labs, Samsung Electronics R&D Institute UK and Toshiba Medical Visualization Systems. Many of the representatives gave presentations about their company’s research, set up company booths with promotional materials and attended the Industrial Advisory Board meeting. They all listened to student presentations and participated in the ‘matchmaking’ sessions, in which they could speak with selected students one to one.

CDT PPar Internal Conference and Industrial Engagement Event – 2-3 Jun 2015

Representatives from the following companies attended: Altran, ARM, Cirrus Logic, Codeplay Software, Contemplate Ltd, Cray UK, CriticalBlue, Keysight Technologies, Oracle Labs and Samsung Electronics R&D Institute UK. They heard keynote and student presentations and talked with students at the poster session, reception and networking lunch. Many also attended the Industrial Advisory Board meeting.

CDT PPar Kick-off and Industrial Engagement Event – 22 Oct 2014

Representatives from the following companies attended: ACE, Amazon Development Centre, ARM, Cirrus Logic, Codeplay Software, Contemplate, Cray UK, CriticalBlue, Erlang Solutions, Freescale, Keysight Technologies, Microsoft Research, Oracle and Samsung R&D Institute UK. Many of the industrials gave presentations, presented posters or banners and attended the Industrial Advisory Board meeting, and they all listened to student presentations and networked with students at the poster session and reception.