Friday, 12th April - 11am Gina-Anne Levow : Seminar

 

Title:   Leveraging Models from High-Resource Languages in Speech Processing for Endangered Language Data 

 

Abstract:

Recent years have seen dramatic strides in automatic speech and language processing, ranging from automatic speech recognition to machine translation. While these advances have benefited from improvements in machine learning algorithms, they are also crucially dependent upon increases in processing power and especially on huge corpora of language data for training and tuning of models.    As a result, these language processing systems are readily only to the few hundred most-resourced languages and may remain largely out of reach for the other over six thousand languages of the world, most of which are low-resource or endangered.

There are not enough linguist-hours to document the many languages which are in danger of disappearing by the end of the century. Based on discussions among field linguists, experts in language documentation, and speech processing researchers, we focus on spoken language processing tasks and tools to alleviate key bottlenecks in endangered language documentation and to enrich existing archives, including speaker diarization, speaker identification, and speech alignment. We investigate approaches and strategies leveraging high-resource languages and pre-trained models for these tasks on diverse endangered language data.  We highlight the factors which affect performance on these challenging datasets.

Bio:

Gina-Anne Levow is an Associate Professor in the Linguistics Department at the University of Washington. She received her Ph.D. and M.S. in Computer Science from the Massachusetts Institute of Technology and Bachelor's degrees in Computer Science and Oriental Studies from the University of Pennsylvania. Prior to joining UW, she served as a Research Fellow at the University of Manchester and on the faculty of the Department of Computer Science at the University of Chicago.

Levow's research centers on speech and language processing, with a focus on multi-lingual and minimally supervised approaches to language understanding. Her research emphasizes the role of prosody and linguistic structure in computational modeling, as well as low-resource and endangered languages. Her work has been funded by NSF and DARPA on spoken and multimodal language processing, including intonation recognition, conversation dynamics, stance, and machine translation.

 

Apr 12 2024 -

Friday, 12th April - 11am Gina-Anne Levow : Seminar

This event is co-organised by ILCC and by the UKRI Centre for Doctoral Training in Natural Language Processing, https://nlp-cdt.ac.uk.

IF G.03 and Teams invite