Information Extraction, Retrieval and Presentation

A list of potential topics for PhD students in the area of Information Extraction, Retrieval and Presentation.

Knowledge Enhanced Language Models for Information Retrieval

Supervisors: Jeff Pan and Mirella Lapata

Pre-trained language models have been shown useful for improving the performance of information retrieval, in particular on query understanding. However, such models still suffer from lacking of knowledge and inference capabilities. This present project seeks to investigate how to enrich pre-trained language models with structured knowledge for information retrieval. Projects in this area could involve extending, combining or transforming language models with e.g. some common sense or domain specific schema and / or theory for tasks related to information retrieval, such as document ranking, and compositional reasoning for complex queries.

Knowledge Aware Open Domain Conversational Search

Supervisors: Jeff Pan and Mirella Lapata

In search, there have been a recent trend that people begin to use more conversational search queries, which allows them to ask more specific and context-based questions about the products and services that they are interested in. The present project will address issues around knowledge aware open domain conversational search, such as conversational semantic parsing, multi-hop reasoning and assumption reasoning, as well as constrained decoding for response generation. 

Retrieval-Enhanced Machine Learning

Supervisor: Jeff Dalton

The aim of this project is to study new methods for deeply integrating retrieval into core machine learning pipelines and learning approaches. The aim is to develop new methods for algorithmic users of search including new machine-generated queries for inputs with latent (vector) representations, predicting modeling the utility of retrieval in learning tasks, as well as new feedback mechanisms to improve quality. 

Neural Retrieval for Entity-Centric Search

Supervisor: Jeff Dalton

The aim of this project is to explore new methods to learn effective representations for entities with neural retrieval.  It addresses fundamental problems with existing approaches that do not represent structured entities effectively. It will develop new methods for learning dense or sparse vector representations that encode entities, relationships, and their interactions in complex events. 

Language Model Composition for Knowledge-Intensive Search Tasks

Supervisor: Jeff Dalton

This project aims to study using LLMs with in-context learning to create new dynamic retrieval models for complex tasks such as conversational search, multi-hop question answering, and generative knowledge summarization. It builds on and extends methods for retrieval augmented generation for specialized domains. It builds on and studies elements of program synthesis, LLM-based tool use, and neural retrieval methods.  

Large Language Models for Large-Scale Generative Test Collection Synthesis

Supervisor: Jeff Dalton

This project studies using state-of-the-art LLMs to generate multimodal data for all aspects of multimodal retrieval test collection. This includes automatically synthesizing queries, complex documents (with tables, graphs, and images), and simulated user information needs and judgments. The goal is to create large-scale collections of interconnected information (like Wikipedia) to develop and evaluate next-generation retrieval methods.