Turing AI Postdocs Workshop #1
A 23 August 2022 workshop for ELIAI postdoctoral researchers to present and discuss their research
This Turing AI Postdocs Workshop, the first of many more to come, was hosted by ELIAI and organized by Director Mirella Lapata. The workshop provided an opportunity for the ELIAI postdoctoral researchers Xue Li (project title "Causal Knowledge Graphs for Counterfactual Claims Reasoning"), Davide Moltisanti (project title "Grounding Actions and Action Modifiers in Instructional Videos"), Victor Prokhorov (project title "Multimodal Interpretability from Partial Sight"), and Cheng Wang (project title "Asking your Self-Driving Car to Explain its Decisions") to present and discuss their work on their respective projects. These workshops will be held periodically as the researchers progress on their project objectives.
Title: Misinformation Detection in Counterfactual Claims based on Probabilistic Reasoning over Knowledge Graphs
Abstract: Misinformation detection in a counterfactual claim is crucial, e.g. for detecting rumours in social media, where a single AI approach is not enough, especially for claims that are previously unseen. In this project, we aim to develop a system that will apply natural language processing (NLP) techniques to parse a given counterfactual claim and then query knowledge graphs (KGs) with the recognised entities in the claim. The answers to these queries will be analysed to conclude whether the original claim is supported or refuted. In addition, probability will be computed to represent how much the conclusion is trusted. Then the explanation of the conclusion will be provided as a part of the output.
Title: Understanding Adverbs in Videos
Abstract: Given a video showing a person performing an action, we are interested in understanding how the action is performed (e.g. chop something quickly or finely, etc). Current methods for this under-explored task model adverbs as invertible action modifiers in a joint visual-text embedding space. However, these methods do not explicitly guide the model to look for salient visual cues in the video to learn how actions are performed. We believe that a model would better understand adverbs if it were provided with specific guidance as to what to look for when tasked to recognise an adverb in a video. We thus plan to design a mixture-of-experts method that is trained to look for specific visual cues, e.g. the model should look at temporal dynamics for speed adverbs (e.g. quickly/slowly) or at spatial regions for completeness adverbs (e.g. fully/partially). Another challenge of adverb understanding is the lack of good quality datasets. Currently, datasets for this task are mostly re-purposed from other domains and as such are noisy, i.e. videos are loosely trimmed and there is no guarantee that the action or adverb are actually visible. We address this issue collecting a new high quality dataset, HowTo100M/Recipes, where cooking videos are annotated with action and adverb labels. Most importantly, videos are trimmed more tightly and are guaranteed to show the corresponding action and adverb, making the dataset a good resource for current and future investigation on adverb understanding.
Title: Multimodal Interpretability from Partial Sight
Abstract: We seek to build DGMs that capture the joint distribution over co-observed visual and language data (e.g. abstract scenes, COCO, VQA), while faithfully capturing the conceptual mapping between the observations in an interpretable manner. This relies on two key observations: (a) perceptual domains (e.g. images) are inherently interpretable, and (b) a key characteristic of useful abstractions are that they are low(er) dimensional (than the data) and correspond to some conceptually meaningful component of the observation. We will seek to leverage recent work on conditional neural processes (Garnelo et al, 2018) to develop partial-image representations to mediate effectively, and in an interpretable manner, between vision and language data. Evaluation of this framework will involve both the ability to generate multimodal data against state-of-the-art approaches, as well as on human-measured interpretability of the learnt representations. Our project image represents multi-modal data(images, text) as a "partial specification" that allows effective encoding and reconstruction of data.
Title: Explainable AI for Trustworthy and Transparent Autonomous Vehicles
Abstract: Artificial Intelligence (AI) approaches have brought considerable improvements to the technologies for autonomous vehicles (AVs). In particular, AI-based vision even dominates in deployed AVs. However, the highly complex AI is usually difficult to understand due to its black-box characteristic. As a result, public scepticism about AVs arises, and the acceptance of AVs is reduced as well. Therefore, it is essential to build transparent and explainable AI systems. In this project, we will develop approaches to ask the AV to explain its decisions in intuitive terms regarding the questions of a human passenger. Generating explanations automatically, extracting reference points from natural language queries and human subject experiments to evaluate explanations are the focuses to be studied in order to realize this goal.