07 Oct 2016 - Dong Nguyen: Seminar
A Kernel Independence Test for Geographical Language Variation
Quantifying the degree of spatial dependence for linguistic variables is a key task for analyzing dialectal variation. However, existing approaches have important drawbacks. First, they are based on parametric models of dependence, which limits their power in cases where the underlying parametric assumptions are violated. Second, they are not applicable to all types of linguistic data: some approaches apply only to frequencies, others to boolean indicators of whether a linguistic variable is present.
In this talk, I will present a new approach for measuring geographical language variation, which solves both of these problems. The approach builds on Reproducing Kernel Hilbert space (RKHS) representations for nonparametric statistics. I will discuss experiments on both synthetic data and a diverse set of empirical datasets.
Dong Nguyen is a PhD student at the University of Twente. She is also affiliated with the Meertens Institute. She is interested in developing text mining methods that can help answer questions from the social sciences and the humanities. She especially enjoys working with social media data. Her work has been featured by various news outlets, including the New York Times and Time Magazine. She will join the Alan Turing Institute in January 2017 as a Research Fellow. Edinburgh University will be her host university. She has a Master's degree from Carnegie Mellon University.