15 February 2019 - Aline Villavicencio: Seminar
Identifying Idiomatic Language with Distributional Semantic Models
Precise natural language understanding requires adequate treatments both of single words and of larger units. However, expressions like compound nouns may display idiomaticity, and while a police car is a car used by the police, a loan shark is not a fish that can be borrowed. Therefore it is important to identify which expressions are idiomatic, and which are not, as the latter can be interpreted from a combination of the meanings of their component words while the former cannot. In this talk I discuss the ability of distributional semantic models (DSMs) to capture idiomaticity in compounds, by means of a large-scale multilingual evaluation of DSMs in French, Portuguese and English. The results obtained show a high correlation with human judgments about compound idiomaticity (Spearman’s ρ=.82 in one dataset), indicating that these models are able to successfully detect idiomaticity.
Aline Villavicencio is a Reader in Computer Science affiliated to the Federal University of Rio Grande do Sul (Brazil) and also affiliated to the University of Essex (UK). Her research interests include lexical semantics, multilinguality, and cognitively motivated NLP. She received her PhD from the University of Cambridge (UK) in 2001, and held postdoc positions at the University of Cambridge and University of Essex (UK). During 2011-2012 and 2014-2015, she was on sabbatical at the Massachusetts Institute of Technology (USA). She is a current member of the editorial board of the Journal of Natural Language Engineering, the Transactions of the Association for Computational Linguistics, among others, and is Area Chair for NAACL 2018, for COLING 2018 and for IBERAMIA 2018, and the Chair for the International Conference on Computational Processing of Portuguese (PROPOR 2018). She is also a regular member of the program committee for the various ACL conferences, and has co-chaired numerous *ACL workshops on Cognitive Aspects of Computational Language Acquisition and on Multiword Expressions. She has co-edited special issues and books dedicated to these topics.