Information Extraction, Retrieval and Presentation

A list of potential topics for PhD students in the area of Information Extraction, Retrieval and Presentation.

Language Resources Extraction from Social Media

Supervisor: Walid Magdy

Social Media data contains large amount of information, knowledge, and resources that get generated by users every second. Although it might look noisy for the first instance, but when applying novel data mining techniques, we can extract valuable language resources that could be used as training data for different NLP tasks. The typical research question in that field will be how to automatically extract data from social media that can be used for training a machine learning model in a distant-supervision manner to different NLP applications such as, machine translation, classification, paraphrasing, NER, Multilingual word/sentence embeddings, sentiment analysis, and others.

User Behaviour Analysis on Social Media

Supervisor: Walid Magdy

Current trends in computational social science shows that data science represented in machine learning, NLP, and data mining could be used to learn about human behaviour from their activities on social media networks. For example, research questions such as: Why some people use hate speech on social media? Why voters decide to vote for a given party or candidate? How to predict voting decisions of individuals from their social media posts? How to detect people on social media suffering from depression or mental health? How to detect fake-accounts? Bots? How to detect child grooming before it happens from social media communications? Some of the current present techniques in NLP, ML, and network analysis could be directly used for answering such RQ’s. However, sometimes new techniques would need to be developed for an accurate and representative analysis. Working in this area requires having good knowledge in data data science and social/political science.