Wednesday, 14th February - 11am Max Ryabinin : Seminar

Title: The Impact of Prompt Robustness on LLM Evaluation

Abstract:

Large language models (LLMs) have become a subject of frequent study due to their rapidly improving capabilities. Usually, these models are evaluated on a set of benchmarks by prompting with task-specific instructions or in-context demonstrations. Naturally, the content of the prompt influences the quality of predictions; however, little attention is paid to details that determine the prompt format. As several recent works have shown, these seemingly minor details can have an outsized impact on the model performance, but LLM studies still use different formats across studies. In this talk, I will overview our latest results on prompt robustness for in-context learning: we analyze hundreds of format templates across 4 datasets and 19 models, showing that even advanced prompting methods do not guarantee reduced prompt sensitivity, and the best templates do not transfer across evaluation setups. Our findings suggest that future LLM evaluation frameworks need to take their lack of prompt robustness into account and report exact prompt templates used.

Bio:

Max Ryabinin is a Distinguished Research Scientist at Together AI working on large-scale and efficient deep learning. Previously, he was a Senior Research Scientist at Yandex, studying a variety of topics in NLP and machine learning systems. In 2021-2022, Max has served as the working group chair for the BigScience Research Workshop, helping build BLOOM — the largest multilingual language model at that time. Max received his PhD on decentralized deep learning from HSE University: in a series of publications, he proposed methods for training large neural networks over slow and unstable networks.

Feb 14 2024 11.00 - 12.00

Wednesday, 14th February - 11am Max Ryabinin : Seminar

This event is co-organised by ILCC and by the UKRI Centre for Doctoral Training in Natural Language Processing, https://nlp-cdt.ac.uk.

IF G.03 and online

Contact