LFCS Seminar: 5 October 2018 - Lucian Popa

Title: Human-in-the-Loop Entity Resolution for Knowledge Curation


Entity resolution is a key form of reasoning that allows to establish explicit connections among entities across heterogeneous datasets. Such connections can represent "same-as" links between different representations of the same real-world entity or, more generally, can represent various types of relationships among entities. Along with other ubiquitous operations such as information extraction, data transformation and fusion, entity resolution is a crucial step for building high-value, domain-specific knowledge bases from raw data. In this talk, I will describe our work at IBM Research - Almaden towards better abstraction and tools for entity resolution. First, I will describe a declarative approach that uses constraints and provides a logical foundation towards reasoning about various types of entity linking specifications and their expressive power. This also forms the theoretical underpinning for a concrete high-level language that is used in production by IBM. I will then talk about human-in-the-loop, active learning techniques to further lower the human effort needed to reach high-accuracy entity resolution algorithms in concrete application scenarios.

Bio: Lucian Popa is a Principal Research Staff Member and Manager at IBM Research - Almaden, which he joined in 2000 after receiving his PhD in Computer Science from the University of Pennsylvania. He is known for his work on data exchange, schema mapping and, more recently, entity resolution. At IBM, he has contributed to several products, and leads a research team focused on human-in-the-loop systems for structured knowledge creation and learning.

Oct 05 2018 -

LFCS Seminar: 5 October 2018 - Lucian Popa

Speaker: Lucian Popa

MF2 level 4