IPAB Workshop-30/04/2020

Henry Gouk

Title: Distance-Based Regularisation of Deep Networks for Fine-Tuning

Abstract:

In this talk I will discuss approaches to regularisation during fine-tuning of deep neural networks. First I provide a neural network generalisation bound based on Rademacher complexity that uses the distance the weights have moved from their initial values. This bound has no direct dependence on the number of weights and compares favourably to other bounds when applied to convolutional networks. The bound is highly relevant for fine-tuning, because providing a network with a good initialisation based on transfer learning means that learning can modify the weights less, and hence achieve tighter generalisation. Inspired by this, I develop a simple yet effective fine-tuning algorithm that constrains the hypothesis class to a small sphere centred on the initial pre-trained weights, thus obtaining provably better generalisation performance than conventional transfer learning. Empirical evaluation shows that our algorithm works well, corroborating our theoretical results. It outperforms both state of the art fine-tuning competitors, and penalty-based alternatives that we show do not directly constrain the radius of the search space.

Chenyang Zhao

Title: Robust Domain Randomisation by Policy Distillation

Abstract:

While deep reinforcement learning has demonstrated great success recently, learning agents that can generalise to domain shifts is still a great challenge. One method to overcome this problem is domain randomisation, which randomises properties of environment during training. However, learning can be unstable as the range of randomisation becomes wider. In practice, this requires a large amount of manual tuning and a tight iteration loop between randomisation design and validation. In this work, instead of merging experience from randomised domains and training jointly, we propose to alternatively train multiple agents locally in randomised domains and distil a main agent with minimal Kullback–Leibler divergence to local agents in predicted action distributions. We conduct experiments in simulated tasks and demonstrate our method can learn robustly and achieve better generalisation performances.

Apr 30 2020 13.00 - 14.00

IPAB Workshop-30/04/2020

Henry Gouk, Chenyang Zhao

Blackboard Collaborate