Multimodal Interpretability from Partial Sight

We seek to build DGMs that capture the joint distribution over co-observed visual and language data.