Dialogue and Multimodal Interaction

A list of potential topics for PhD students in the area of Dialogue and Multimodal Interaction.

Robot Learning via Trial and Error and an Extended Conversation with an Expert

Supervisor: Alex Lascarides, Subramanian Ramamoorthy

A field of robotics known as Learning from Demonstration teaches robots new skills through a mix of trial and error and physical enactment or manipulation by a human expert.  There is some preliminary work that enhances this evidence with linguistic utterances, but their underlying messages are rudimentary (e.g., "no"), or pertain to just the current situation (e.g., "go left"). This project will investigate how using current semantic parsing and symbol grounding can enhance the task of learning optimal policies, when the expert's utterances include quantification and abstraction (e.g., "when putting fruit in a bowl, always grasp it softly and lower it slowly").

The content of multimodal interaction

Supervisor: Alex Lascarides

To design, implement and evaluate a semantic model of conversation that takes place in a dynamic environment.

It is widely attested in descriptive linguistics that non-linguistic events dramatically affect the interpretation of linguistic moves and conversely, linguistic moves affect how people perceive or conceptualise their environment.  For instance, suppose I look upset and so you ask me "What's wrong?"  I look over my shoulder towards a scribble on the living room wall, and then utter "Charlotte's been sent to her room".  An adequate interpretation of my response can be paraphrased as: Charlotte has drawn on the wall, and as a consequence she has been sent to her room.  In other words, you need to conceptualise the scribble on the wall as the result of Charlotte's actions; moreover, this non-linguistic event, with this description, is a part of my response to your question.  Traditional semantic models of dialogue don't allow for this type of interaction between linguistic and non-linguistic contexts.  The aim of this project is to fix this, by extending and refining an existing formal model of discourse structure to support the semantic role of non-linguistic events in context in the messages that speakers convey. The project will draw on data from an existing corpus of people playing Settlers of Catan, where there are many examples of complex semantic relationships among the player's utterances and the non-linguistic moves in the board game.  The project involves formally defining a model of discourse structure that supports the interpretation of these multimodal moves, and developing a discourse parser through machine learning on the Settlers corpus.

Preference Elicitation

Supervisor: Alex Lascarides

Goal: To learn and act upon the user's preferences, using evidence from an extended embodied dialogue with the user and the observable consequences of actions in the environment.

There are many tasks where robots or software agents must learn particular preferences of the user after deployment and pre-training (e.g., domestic robots).  These preferences may be out of distribution from those in the training data they were trained on.  The most natural way to tackle such cases, for the user at least, is often via an extended embodied conversation, where the user says things like "When stacking the dishwasher, always put the dinner plates in the left rack" and the robot may respond with "Even when the only space for the large saucepan is on the left?".  The aim of this project is design, implement and evaluate algorithms that: (a) extract preference information from extended dialogue; and (b) combine this qualitative and structured information about preferences that's extracted from dialogue with the quantitative reward functions learned from trial and error when acting in the environment.

Next-Generation Multimodal Conversational Task Assistance

Supervisor: Jeff Dalton

The aim of this project is to build on next-generation conversational assistants that help people perform real-world tasks.  It builds on multimodal generative language models for automatically understanding tasks and using them to perform personalized task adaptation based on factors like expertise, preferences, and constraints. It develops new methods for knowledge-grounded generation of information for a dynamic environment.  It studies the interplay between Human and AI systems to not just perform tasks, but also to teach and entertain in the process.  It builds on the open-source Open Assistant Toolkit deployed to millions of users in the Amazon Alexa Prize to study real-world use cases. 

Computational Models of Privacy

Supervisor: Nadin Kökciyan

Goal: To develop a personal privacy assistant that will work with its user to handle multi-party privacy in online systems

Online systems, such as social networks, may violate the privacy needs of the users who have little control of their own data. Privacy is one of the most important ethical values to preserve, as highlighted by its protection in law, such as under GDPR. This project aims to develop a personalised privacy assistant that can: (i) analyse content (e.g.; text, image) by using neurosymbolic approaches to ensure its user's privacy is preserved before sharing data, (ii) enable multi-party privacy when the content is revealing sensitive information about multiple users (e.g. a picture with friends), (iii) conduct dialogues in natural language to collaborate with its user to make ethical privacy decisions.