ANC Workshop - David Sterratt, Antreas Antoniou

Tuesday, 6th February 2024

Actionable visualisation principles and guidance for a foundational data science course - David Sterratt

Abstract: 

A learning outcome of the second year University of Edinburgh course "Informatics 2 - Foundations of Data Science" is that students can describe and apply good practices for visualising data. There are several sets of guidelines about what constitutes good visualisation practice in the literature and online, but each set focuses on different aspects of visualisation, there is no one-to-one mapping between the sets, and the level of the guidelines ranges from very general (e.g. "Show the data", Tufte, 1982) to very specific ("Avoid spaghetti charts", Schwabish, 2021). We could not find a single set of guidelines that was: (i) appropriate to the level of the course and the static visualisations we expected students to produce using Matplotlib and Seaborn; and (ii) actionable in the sense that students and markers could assess visualisations against the criteria. We therefore constructed our own set of five visualisation principles, each with a number of subsidiary guidelines. The principles and guidance is concise enough to be printed on an A4 sheet, which the students can use in group workshops to assess visualisations.

In this talk we will outline the principles and guidance, and how we use them in the course for both instruction, and formative feedback and assessment. We will focus on how we address many students' difficulty with creating legible text in plots, which means that many of them do not score well on the principle of making the data accessible. We will evaluate informally the efficacy of the guidelines.

   Tufte, E. 1982. The Visual Display of Quantitative Information. Cheshire, Connecticut: Graphics Press.

   Schwabish, J. 2021. Better Data Visualisations: a Guide for Scholars, Researchers and Wonks. New York: Columbia University Press.

   This is a practice for a talk I hope to give at the UK Conference on Teaching Statistics (UKCOTS) in June.

Building for the LMM Era: Constructing Scalable, Rich, and License-Compliant Multimodal Datasets - Antreas Antoniou

Abstract: 

As we progress in the era of Large Language Models (LLMs), and beginning to enter the era of Large Multi Modal models (LMMs), the necessity for diverse and rich datasets becomes increasingly apparent. While unimodal and duomodal datasets have spurred significant advancements in deep learning, the rising demand for comprehensive multimodal understanding calls for an expansion in our dataset dimensions. In our talk, we present TALI, a novel quadra-momal dataset that signifies a key shift in multimodal research. TALI, with its aligned text, video, images, and audio, serves as a fertile platform for pioneering self-supervised learning tasks. It opens new avenues for researchers to explore the influence of different modalities and data/model scaling on downstream performance among others. As we delve into the construction of scalable, rich, and license-compliant multimodal datasets, we envision TALI inspiring diverse research ideas, contributing to a deeper understanding of model capabilities and robustness in deep learning.

Event type: Workshop

Date: Tuesday, 6th February

Time: 11:00

Location: G.03

Speaker(s): David Sterratt, Antreas Antoniou

Chair/Host: Thomas Lee