29 May 2019 - Haim Dubossarsky: Seminar
Challenges in NLP research in the age of all-too-easy modeling
Recent years have seen the rise of machine learning models in NLP research, including research on questions motivated by linguistic theory. Indeed, it has now become relatively easy to model and to test research problems. The ease with which models can be deployed comes at the risk of careless use, which may potentially lead to unreliable findings, and ultimately even hinder our ability to extend our knowledge. Such misuse may stem, for example, from unfamiliarity with hypotheses that are implicit to the models, or inherent confounds that demand experimental controls.
In this talk, I will focus on such problems that are specific to linguistically-motivated questions (e.g., semantic change), but also to more general NLP problems (e.g., polysemy resolution and representation), where word embedding are the prominent ML models. Major problems include biases induced by word frequency, similarity estimation of noisy word vector representations, and the evaluation of models’ performance in the absence of proper evaluation tasks in general. I will suggest ways to mitigate some of these problems, and share some ideas about performing valid scientific research in the age of all-to-easy modeling.
Dr. Dubossarsky completed his PhD at the Hebrew University of Jerusalem under the supervision of Prof. Daphna Weinshall (CS department) and Dr. Eitan Grossman (Linguistics department). Though he obtained training in psycholinguistics and computational neuroscience, Dr. Dubosarsky devoted his doctoral training to the study of computational linguistics, and particularly the field of semantic change. Building on his multidisciplinary skills, his work made both scientific and methodological contributions to the field which were published in top tier venues. Dr. Dubossarsky is currently supported by the Blavatnik foundation to carry out his post-doctoral research at the Language & Technology Lab in Cambridge, headed by Prof. Anna Korhonen, where he studies the intricacies between linguistic typology and NLP models, and their potential at improving both models’ quality and broadening linguistic typology understanding. In addition, he is still actively researching what can be labeled as “methodological aspects of Machine Learning use in NLP research”.