Friday, 17th May - 12.00 Selma Tekir : Seminar

Title: Learning Citation-Aware Representations for Scientific Papers

Abstract:

Considering the vital role of citations in understanding scientific documents, the proposed work aims to enrich the pre-trained language models with citation information. Distinctively, we propose a language model pre-training by specifically masking citations. In concrete terms, we further pre-train a base model using a citation masking strategy so that the model can learn how to represent citations properly. We add citation tokens in the parenthetical author-date citation style to the model vocabulary beforehand. Since a citation context may not refer to a reference uniquely but relates to a set of candidate references, the next step is to extend the context with the ground-truth reference's global information, such as title and abstract, using the Retrieval-Augmented Language Model Pre-Training. Another possibility is using citation token hidden representations learned in this way as their document-level embeddings alternative to document-level representations for scientific documents, e.g., SciBERT and SPECTER. In other words, whether they can capture global properties such as closeness between scientific papers.

Bio:

Selma Tekir studied Computer Engineering at Ege University in Turkey. She received her M.S. in Computer Engineering from Izmir Institute of Technology. In 2009, she worked as a visiting researcher at the Faculty of Computer Science Chair for Databases, Data Analysis, and Visualization at the University of Konstanz, Germany. Tekir received a Ph.D. in Computer Engineering from Ege University in 2010. Her research interests include machine learning for NLP, NLP applications, the combination of NLP with symbolic reasoning. Currently, she is working as an associate professor at Izmir Institute of Technology, Department of Computer Engineering.

 

 

May 17 2024 -

Friday, 17th May - 12.00 Selma Tekir : Seminar

This event is co-organised by ILCC and by the UKRI Centre for Doctoral Training in Natural Language Processing, https://nlp-cdt.ac.uk.

IF G.03 and Teams invite