29th April 2021 - 10am - Yonatan Belinkov: Seminar

TITLE: Causal Mediation Analysis for Interpreting NLP Models: The Case of Gender Bias

ABSTRACT:

The success of neural network models in various tasks, coupled with their opaque nature, has led to much interest in interpreting and analyzing such models. Common analysis methods for interpreting neural models in natural language processing typically examine either their structure (for example, probing classifiers) or their behavior (challenge sets, saliency methods), but not both. In this talk, I will propose a new methodology grounded in the theory of causal mediation analysis for interpreting which parts of a model are causally implicated in its behavior. This methodology enables us to analyze the mechanisms by which information flows from input to output through various model components, known as mediators. I will demonstrate an application of this methodology to analyzing gender bias in pre-trained Transformer language models. In particular, we study the role of individual neurons and attention heads in mediating gender bias across three datasets designed to gauge a model’s sensitivity to gender bias. Our mediation analysis reveals that gender bias effects are (i) sparse, concentrated in a small part of the network; (ii) synergistic, amplified or repressed by different components; and (iii) de-composable into effects flowing directly from the input and indirectly through the mediators. If time permits, I will also present initial results from applying mediation analysis to a different phenomenon, subject-verb number agreement in language models.

BIOGRAPHY:

Yonatan Belinkov is an assistant professor at the Henry and Merilyn Taub Faculty of Computer Science in the Technion. He has previously been a Postdoctoral Fellow at the Harvard School of Engineering and Applied Sciences and the MIT Computer Science and Artificial Intelligence Laboratory. His current research focuses on interpretability and robustness of neural network models of human language. His research has been published at various NLP/ML venues. His PhD dissertation at MIT analyzed internal language representations in deep learning models, with applications to machine translation and speech recognition. He has been awarded the Harvard Mind, Brain, and Behavior Postdoctoral Fellowship and is currently an Azrieli Early Career Faculty Fellow.

Add to your calendar

vCal iCal

Apr 29 2021 10.00 - 11.00

29th April 2021 - 10am - Yonatan Belinkov: Seminar

This event is co-organised by ILCC and by the UKRI Centre for Doctoral Training in Natural Language Processing, https://nlp-cdt.ac.uk

Zoom invitation

Contact