IPAB Seminar-26/05/2023
Title: Fully Decentralized RL in Complex Multi-Agent Settings
Abstract: Single agent RL has enjoyed marquee achievements in recent years. However, acting optimally in multi-agent settings is fundamentally more challenging. The multi-agent setting shifts the focus from interacting with stationary environments to non-stationary agents. There will be allies and adversaries sharing the environment. Cooperating with allies and competing with adversaries (co-opetition) is now key to optimality. Furthermore, other agents leave or join the system, which exacerbates the non-stationarity. Therefore, modeling others becomes critical in a multi-agent setting. In this context, we present a novel decentralized RL method called the interactive advantage actor-critic (IA2C) that utilizes a belief filter to track other agents' actions. We investigate scaling IA2C to the many-agent setting through permutation invariance of joint actions, which is a property of several many-agent domains. A quadratic-time approach of using Dirichlet-multinomial distributions to model agent populations under partial observability allows IA2C++ to accurately and efficiently predict for agent populations. We explore replacing the belief filter of IA2C with an encoder-decoder network to model the underlying hidden state and the predicted actions of the agent population, and demonstrate improved performance. We also introduce Org as our primary evaluation domain, which models co-opetition in a typical organization.
The presentation is based on work published in the AAMAS and UAI conference proceedings jointly with doctoral candidate Keyang He at UGA and Prof. Bikramjit Banerjee at USM.
Speaker Bio: Prashant Doshi is a professor of computer science at the University of Georgia (UGA) in Athens, GA. He received his Ph.D. in CS from the University of Illinois at Chicago in 2005, where his doctoral dissertation was supervised by Piotr Gmytrasiewicz. His research interests broadly fall in AI and Robotics. In AI, he is an expert on autonomous decision making with specific interests in decision making under uncertainty in multiagent settings. In collaboration with Piotr Gmytrasiewicz, Prof. Doshi co-developed the Interactive POMDP (I-POMDP) framework, which takes an individual agent’s perspective to decision making in multiagent settings and complements the predominant focus of previous multiagent research on team decision making. In robotics, Prof. Doshi investigates ways to make learning-by-observing pragmatic for robots and is an expert on inverse reinforcement learning. He established and directs the THINC Lab at UGA, which conducts sponsored research in multiagent systems and robotics. His publications --and other research accomplishments can be accessed from THINC lab's website.
IPAB Seminar-26/05/2023
IF, G.03