Monday, 18th March - 11am Edoardo Ponti : Seminar

 

Title:    Efficiency as an Inductive Bias for Language Models

 

Abstract: Efficiency in Natural Language Processing is often hailed as a solution to democratise access to AI technology and to make it more environmentally sustainable. In this talk, I emphasise an additional and sometimes neglected advantage of efficiency: namely, providing an inductive bias for language use and acquisition closer to humans, where information-theoretic trade-offs shape the very structure of language. In particular, I will explore how efficient designs in language models (a) may also act as inductive biases that improve their usefulness (b). For instance:

 

(1a) Jointly learning to model and tokenise language allows for merging spans of tokens in the intermediate layers of Transformers or in their key-value cache, which reduces time and memory requirements. (1b) In addition, this process also discovers possibly reusable and hierarchical abstractions (such as linguistic units) from raw data. What is more, this results in tokenization-free models that can integrate multiple modalities with different input spaces. 

 

(2a) Learning parameter-efficient modules allows for fine-tuning LLMs with limited memory budgets. (2b) In addition, mixing these specialised modules through appropriate routing also leads to better generalisation. In particular, I will show how modules can be implemented as highly composable sparse adapters and how routing through modules can be learned automatically.

 

(3a) Model merging (or fusion) is a method to compress multiple LLMs into one without interference, which eliminates the need for multiple calls. (3b) In addition, this allows for controlling LLMs better, by incentivising positive behaviours (e.g. creativity) while discouraging negative ones (e.g. hallucinations). Moreover, model merging can integrate knowledge developed asynchronously and independently from disparate sources.

 

In conclusion, efficient designs of LLMs yield unexpected benefits, such as the ability to learn abstractions, adapt fast, and integrate disparate sources of knowledge.

 

Mar 18 2024 -

Monday, 18th March - 11am Edoardo Ponti : Seminar

This event is co-organised by ILCC and by the UKRI Centre for Doctoral Training in Natural Language Processing, https://nlp-cdt.ac.uk.

IF G.03