Multimodal Interpretability from Partial Sight
We seek to build DGMs that capture the joint distribution over co-observed visual and language data.
We seek to build DGMs that capture the joint distribution over co-observed visual and language data (e.g. abstract scenes, COCO, VQA), while faithfully capturing the conceptual mapping between the observations in an interpretable manner. This relies on two key observations: (a) perceptual domains (e.g. images) are inherently interpretable, and (b) a key characteristic of useful abstractions are that they are low(er) dimensional (than the data) and correspond to some conceptually meaningful component of the observation. We will seek to leverage recent work on conditional neural processes (Garnelo et al, 2018) to develop partial-image representations to mediate effectively, and in an interpretable manner, between vision and language data. Evaluation of this framework will involve both the ability to generate multimodal data against state-of-the-art approaches, as well as on human-measured interpretability of the learnt representations. Our project image represents multi-modal data (images, text) as a "partial specification" that allows effective encoding and reconstruction of data.
- Yuge Shi, Siddharth N, Brooks Paige, Philip Torr, Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models, In: Advances in Neural Information Processing Systems (NeurIPS), pp. 15692–15703, December (2019).
- Marta Garnelo, Dan Rosenbaum, Christopher Maddison, Tiago Ramalho, David Saxton, Murray Shanahan, Yee Whye Teh, Danilo Rezende, S. M. Ali Eslami, Conditional Neural Processes, en. In: International Conference on Machine Learning, PMLR, pp. 1704–1713, (2018).
- Hyunjik Kim, Andriy Mnih, Jonathan Schwarz, Marta Garnelo, Ali Eslami, Dan Rosenbaum, Oriol Vinyals, Yee Whye Teh, Attentive Neural Processes, en. In: International Conference on Learning Representations, (2019).
- Adrian Baddeley, Spatial Point Processes and Their Applications, en. In: Stochastic Geometry, Ed. by Lecture Notes in Mathematics, Berlin, Heidelberg: Springer, pp. 1–75, (2007).
- S. M. Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, Koray Kavukcuoglu, Geoffrey E. Hinton, Attend, Infer, Repeat: Fast Scene Understanding with Generative Models, In: Advances in Neural Information Processing Systems, Curran Associates, Inc., vol. 29, (2016).