28 April 2017 - Chloe Braud: Seminar
Transfer learning for discourse parsing
Discourse structures describe the organization of documents in terms of discourse or rhetorical relations (such as "Explanation" or "Contrast") linking clauses and sentences. Discourse analysis has proven to be useful for various downstream applications, such as automatic summarization, question-answering or sentiment analysis. However, the range of applications and the performance are still limited by the low scores of the existing discourse parsers and their focus on English.
Discourse parsing is known to be a hard task: It involves several complex and interacting factors, touching upon all layers of linguistic analysis, from syntax, semantics up to pragmatics. Consequently, also annotation is complex and time consuming, and hence available annotated corpora are sparse and limited in size.
In this presentation, I will describe my works where I tried to tackle these issues using transfer learning strategies. First, I will describe experiments on identifying implicit discourse relations (i.e. lacking a discourse connective such as "but" or "because") in the Penn Discourse Treebank: I proposed strategies relying on transferring knowledge from the explicit examples to the implicit ones, either by augmenting the size of the training set, or by building a task-tailored representation of the words.
I will then present two RST discourse parsers. The first parser relies on multi-task learning to transfer information among several discourse related tasks. The second one involves a combination of all the RST corpora annotated for different languages, leading to improvements on English and to the first systems for Basque and Dutch developed without any training data.
Chloé is a post-doc at the University of Copenhagen (CoAStaL team), working on Natural Language Processing.
She defended her PhD on implicit discourse relation identification in 2015 at the University Paris 7.
She mainly works on discourse parsing, and her research focuses on machine learning for NLP, more specifically for low-resourced languages and domains.