Title: Physical scene understanding through active learning with neural network based simulators
Abstract: Reasoning about the physical structure and interactions between objects is a core capacity that humans develop fairly early in their lifecycle. This knowledge is essential not just for reasoning about how to manipulate these objects in the surrounding world, but also for explaining the course of actions that yielded specific observations. Recent advances using physics based simulators to replicate observations represent an interesting approach to modelling the world. We present a method based on convolutional neural networks that emulate 2-dim classical mechanical systems, going from egocentric views of objects to predictions regarding future dynamics. With a view to data efficiency, we adopt an active learning approach, wherein we sample synthetic world configurations that drive the object into regions of state space where predictions have been poor. The parameters of such an emulator can be adapted to track the dynamics of real world object(s), thus enabling more accurate inferences about future trajectories.
We will report on work in progress towards achieving these goals.
Speaker: Yordan Hristov
Title:Grounding Symbols in Multi-Modal Instructions
Abstract: As robots begin to cohabit in our daily lives in semi-structured environments, the need arises to make sense of instructions involving rich variability. For instance, how linguistic terms are associated with their instantiation in the physical world. Realistically, this process of grounding must cope with small datasets that capture the specificity of particular users’ contextual assignment of meaning to terms. We present a method for processing a raw stream of cross-modal input (including linguistic instructions, visual perception of a scene and a concurrent trace of 3D eye tracking fixations) to produce the segmentation of objects with a correspondent association to high-level concepts. To test our framework we present experiments involving data from a table-top object manipulation scenario. Our results show its usefulness in applications requiring online learning of the meaning of symbols used in typical instructions in human-robot interaction.
IPAB workshop - 01/06/17