Dialogue and Multimodal Interaction

A list of potential topics for PhD students in the area of Dialogue and Multimodal Interaction.

Robot Learning via Trial and Error and and Extended Conversation with an Expert

Supervisor: Alex Lascarides, Subramanian Ramamoorthy

A field of robotics known as Learning from Demonstration teaches robots new skills through a mix of trial and error and physical enactment or manipulation by a human expert.  There is some preliminary work that enhances this evidence with linguistic utterances, but their underlying messages are rudimentary (e.g., "no"), or pertain to just the current situation (e.g., "go left"). This project will investigate how using current semantic parsing and symbol grounding can enhance the task of learning optimal policies, when the expert's utterances include quantification and abstraction (e.g., "when putting fruit in a bowl, always grasp it softly and lower it slowly").

Modelling non-cooperative conversation

Supervisor: Alex Lascarides

Develop and implement a model of conversation that can handle cases where the agents' goals conflict.

Work on adversarial strategies from game theory and signalling theory lack sophisticated models of linguistic meaning. Conversely, current models of natural language discourse typically lack models of human action and decision making that deal with situations where the agents' goals conflict.  The aim of this project is to fill this gap and in doing so provide a model of implicature in non-cooperative contexts.

This project involves analysing a corpus of human dialogues of users playing the game Settlers of Catan: a well-known adversarial negotiating game.  This will be used to leverage extensions to an existing state of the art dynamic semantic model of dialogue content with a logically precise model of the agents' mental states and strategies.  The project will also involve implementing these ideas into a working dialogue system that extends an existing open source agent that plays Settlers, but that has no linguistic capabilities.

Interpreting hand gestures in face to face conversation

Supervisor: Alex Lascarides

Map hand shapes and movements into a representation of their form and meaning.

The technology for mapping an acoustic signal into a sequence of words and for estimating the position of pitch accents is very well established. But estimating which hand movements are communicative and which aren't, estimating which part of a communicative hand movement is the stroke or post-stroke hold (i.e., those part of the move that conveys meaning) is much less well understood. Furthermore, to build a semantic representation of the multimodal action, one must, for depicting gestures at least (that is, gestures whose form resembles their meaning) capture qualitative properties of its shape, position and movement (e.g., that the trajectory of the hand was a circle, or a straight line moving vertically upwards).  On the other hand, deictic gestures must be represented using quantitative values in 4D Euclidean space. Mapping hand movement to these symbolic and quantitative representations of form is also an unsolved problem.

The aim of this project is to create and exploit a corpus to learn mappings from the communicative multimodal signals to the representation of their form, as required by an existing online grammar of multimodal action, which in turn is designed to yield (underspecified) representations of the meaning of the multimodal action.  We plan to use state of the art models of visual processing using kinect cameras to estimate hand positions and hand shapes, and design Hidden Markov Models that exploit the visual signal, language models and gesture models to estimate the qualitative (and quantitative) properties of the gesture.

The content of multimodal interaction

Supervisor: Alex Lascarides

To design, implement and evaluate a semantic model of conversation that takes place in a dynamic environment.

It is widely attested in descriptive linguistics that non-linguistic events dramatically affect the interpretation of linguistic moves and conversely, linguistic moves affect how people perceive or conceptualise their environment.  For instance, suppose I look upset and so you ask me "What's wrong?"  I look over my shoulder towards a scribble on the living room wall, and then utter "Charlotte's been sent to her room".  An adequate interpretation of my response can be paraphrased as: Charlotte has drawn on the wall, and as a consequence she has been sent to her room.  In other words, you need to conceptualise the scribble on the wall as the result of Charlotte's actions; moreover, this non-linguistic event, with this description, is a part of my response to your question.  Traditional semantic models of dialogue don't allow for this type of interaction between linguistic and non-linguistic contexts.  The aim of this project is to fix this, by extending and refining an existing formal model of discourse structure to support the semantic role of non-linguistic events in context in the messages that speakers convey. The project will draw on data from an existing corpus of people playing Settlers of Catan, where there are many examples of complex semantic relationships among the player's utterances and the non-linguistic moves in the board game.  The project involves formally defining a model of discourse structure that supports the interpretation of these multimodal moves, and developing a discourse parser through machine learning on the Settlers corpus.