Invited Speakers
2023 Winter Series
Nathan Grinsztajn & Tristan Kalloniatis (InstaDeep)
Winner takes it all: a new perspective on multi-trials RL for combinatorial optimization.
Applications and Scaling of RL for Industry.
Winner takes it all: a new perspective on multi-trials RL for combinatorial optimization.
Applying reinforcement learning (RL) to combinatorial optimization problems is attractive as it removes the need for expert knowledge or pre-solved instances. However, it is unrealistic to expect an agent to solve these (often NP-)hard problems in a single shot at inference due to their inherent complexity. Thus, leading approaches often implement multi-trials strategies, from stochastic sampling and beam-search to explicit fine-tuning.
We argue for the benefits of anticipating these multi-trials strategies at train time using discrete or continuous populations of diverse and complementary policies, based on two recent papers:
Winner Takes It All: Training Performant RL Populations for Combinatorial Optimization (Grinsztajn et al., Neurips 2023)
Combinatorial Optimization with Policy Adaptation using Latent Space Search (Chalumeau et al., Neurips 2023)
Instead of relying on a predefined or hand-crafted notion of diversity, these two approaches make use of specific training schemes that induce an unsupervised specialization targeted solely at maximizing the performance of the population, leading to state of the art RL results on four popular NP-hard problems: traveling salesman, capacitated vehicle routing, 0-1 knapsack, and job-shop scheduling. Generally, these frameworks can be applied to any reinforcement learning problem that can be attempted multiple times.
About the Speaker - Nathan Grinsztajn
Nathan holds a PhD in Reinforcement Learning for Combinatorial Optimization from Univ. Lille and Inria in France. He joined InstaDeep 1 year ago as a research scientist in RL working primarily on CO and sequence modeling problems.
Applications and Scaling of RL for Industry.
This talk will outline some of the ways in which InstaDeep tries to apply AI/RL to solve industry scale problems, particularly those with a Combinatorial Optimisation flavour. I will also give some examples of the collaboration between research and engineering to scale up the compute to run larger experiments while making efficient use of hardware.
About the Speaker - Tristan Kalloniatis
Following a PhD in Algebraic Number Theory, I spent 4 years working as a quant before joining InstaDeep as a Research Engineer at the end of 2020. Since then, I have been working primarily on applying Reinforcement Learning to solve problems in Electronic Circuit Design. In my spare time I enjoy gym, music, and expanding my already excessive Rubik's Cube collection.
1st November 2023
Ilija Bogunovic (UCL)
From Data to Confident Decisions: Robust and Efficient Algorithmic Decision Making
Whether in biological design, causal discovery, material production, or physical sciences, one often faces decisions regarding which new data to collect or experiments to perform. There is thus a pressing need for adaptive algorithms that make confident decisions about data collection processes and enable efficient and robust learning. In this talk, I will delve into the fundamental questions related to these requirements. How can we quantify uncertainty and efficiently learn to discover robust solutions? How can we design learning-based decision-making methods that are resistant to outliers, data shifts, and attacks? How can we utilize the inherent problem structure to achieve efficient learning? In light of the previous questions, I will examine the core statistical and robustness aspects through the perspective of Bayesian optimization and reinforcement learning. I will highlight the shortcomings of standard methods and present novel algorithms that come with strong theoretical guarantees. I will also showcase their robust performance in various applications by utilizing real-world data and popular benchmarks and finally map the main avenues for future research.
About the Speaker
Ilija Bogunovic is a Lecturer (assistant professor) in the Electrical Engineering Department at the University College London. Before that, he was a postdoctoral researcher in the Machine Learning Institute and Learning and Adaptive Systems group at ETH Zurich. He received a Ph.D. in Computer and Communication Sciences from EPFL and an MSc in Computer Science from ETH Zurich. His work has been recognized through a Google Research Scholar Program Award and EPSRC New Investigator Award.
His research interests are centered around data-efficient interactive machine learning, reinforcement learning, reliability and robustness considerations in data-driven algorithms and are motivated by a range of emerging real-world applications. He co-founded a recurring ReALML ICML workshop on “Adaptive Experimental Design and Active Learning in the Real World".
11th October 2023
Razvan Pascanu (Google DeepMind)
Resurrecting Recurrent Neural Networks for Long Sequences
In this talk I will focus on State Space Models (SSM), a recently introduced family of sequential models and specifically discuss the relationship between SSMs and recurrent neural networks. I will start with a short history of architecture design for language modelling, which I will use as a motivating task. This will allow me to provide some insights in the evolution of RNN architectures, and why some choices behind the SSM architecture seemed counter-intuitive. Most of the talk will focus on introducing the Linear Recurrent Unit architecture, explaining the role of the various modifications from traditional non-linear recurrent models.
The talk will conclude with some open questions about the role recurrent architectures could or should play, and potentially the less well understood relationship between these SSM models and transformer like architectures.
About the Speaker
Razvan Pascanu has been a research scientist at Google DeepMind since 2014. Before this, he did his PhD in Montréal with prof. Yoshua Bengio, working on understanding deep networks, recurrent models and optimization. Since he joined DeepMind he has also had significant contributions in deep reinforcement learning, continual learning, meta-learning, graph neural networks as well as continuing his research agenda of understanding deep learning, recurrent models and optimization. Please see his scholar page for specific contributions. He is also actively promoting AI research and education as a main organizer of Conference on Life-long Learning Agents (CoLLAs) lifelong-ml.cc , Eastern European Machine Learning Summer School (EEML) www.eeml.eu and www.workshops.eeml.eu as well as different workshops at NeurIPS, ICML and ICLR.
2023 Spring Series
4th July 2023
Daniel Mankowitz & Andrea Michi (DeepMind)
AlphaDev: Faster sorting algorithms discovered using deep reinforcement learning
Fundamental algorithms such as sorting or hashing are used trillions of times on any given day1. As demand for computation grows, it has become critical for these algorithms to be as performant as possible. Whereas remarkable progress has been achieved in the past2, making further improvements on the efficiency of these routines has proved challenging for both human scientists and computational approaches. Here we show how artificial intelligence can go beyond the current state of the art by discovering hitherto unknown routines. To realize this, we formulated the task of finding a better sorting routine as a single-player game. We then trained a new deep reinforcement learning agent, AlphaDev, to play this game. AlphaDev discovered small sorting algorithms from scratch that outperformed previously known human benchmarks. These algorithms have been integrated into the LLVM standard C++ sort library3. This change to this part of the sort library represents the replacement of a component with an algorithm that has been automatically discovered using reinforcement learning. We also present results in extra domains, showcasing the generality of the approach.
About the Speakers
Daniel Mankowitz is a Staff Research Scientist at Google Deepmind, working on solving the key challenges in Reinforcement Learning algorithms that unlock real-world applications at scale. This includes a focus on Reinforcement Learning from Human Feedback (RLHF) in the context of Large Language Models (LLMs). Mankowitz has worked on: code optimization, code generation, video compression, recommender systems, and controlling physical systems such as Heating Ventilation and Air-Conditioning (HVAC), with publications in Nature and Science.
Andrea Michi is a Senior Research Engineer at Google DeepMind working on Reinforcement Learning applications. Michi has worked on a range of domains including renewable forecasting, code optimization, control of physical systems such as Heating Ventilation and Air-Conditioning (HVAC) and the magnetic confinement in a Tokamak. More recently, Michi has focused on Reinforcement Learning from Human Feedback (RLHF) to align Large Language Models (LLMs) to human preferences.
7th June 2023
Andrew Lampinen (DeepMind)
Passive learning of active causal strategies in agents and language models
What can be learned about causality and experimentation from passive data? This question is salient given recent successes of passively-trained language models in interactive domains such as tool use. Passive learning is inherently limited. However, we show that purely passive learning can in fact allow an agent to learn generalizable strategies for determining and using causal structures, as long as the agent can intervene at test time. In this talk, I will show empirically that agents trained via passive imitation on expert data can indeed generalize at test time to infer and use causal links which are never present in the training data; these agents can also generalize experimentation strategies to novel variable sets never observed in training. This is possible even in a more complex environment with high-dimensional observations, with the support of natural language explanations. Explanations can even allow passive learners to generalize out-of-distribution from perfectly-confounded training data. Finally, I'll show that language models, trained only on passive next-word prediction, can generalize causal intervention strategies from a few-shot prompt containing examples of experimentation, together with explanations and reasoning. These results highlight the surprising power of passive learning of active causal strategies, and may help to understand the behaviors and capabilities of language models. (https://arxiv.org/abs/2305.16183)
About the Speaker
Andrew Lampinen is a Senior Research Scientist at Google DeepMind. Previously, he completed his PhD at Stanford University, and his BA at UC Berkeley. His work focuses on using methods from cognitive science to analyze AI, and using insights from cognitive science to improve AI, and covers areas ranging from RL agents to language models. He is particularly interested in cognitive flexibility and generalization, and how these abilities are enabled by factors like language, memory, and embodiment.
24th May 2023
Florian Fuchs (Sony AI)
Outracing champion Gran Turismo drivers with Deep Reinforcement Learning
Gran Turismo Sophy is a revolutionary superhuman racing agent designed to compete against top Gran Turismo® Sport drivers and elevate their gaming experience.
GT Sophy was trained using novel deep reinforcement learning techniques, including state-of-the-art learning algorithms and training scenarios developed by Sony AI, using Gran Turismo Sport, a real driving simulator, and by leveraging Sony Interactive Entertainment's cloud gaming infrastructure for massive scale training.
About the Speaker
Florian Fuchs is an AI engineer at Sony AI Zurich. His work focuses on applying Reinforcement Learning to interactive and dynamic games in order to enhance the gaming experience and support game developers to unleash their creativity. Florian holds an MSc in computer science with a focus on machine learning at the University of Zurich. After his master thesis, where he first achieved super-human time trial results in the highly realistic racing simulator "Gran Turismo SPORT" using end-to-end Deep Reinforcement Learning, he was then part of the Sony AI team who developed the first racing agents competitive with the world’s best e-sports drivers.
19th April 2023
Lucy Checke (University of Cambridge)
A Comparative Cognition Approach to AI Evaluation
Recording - Release waiting for final approval
Understanding and predicting behaviour has been the business of psychologists for over a century. Within human psychology we can rely to some extent on introspection to understand the underlying drivers of behaviour, but this is less straightforward with animals. The problem of peering inside the "black box" of nonhuman animals shares much with the challenge of understanding the capabilities of AI systems - which exhibit extraordinarily - clever-seeming - behaviour, but are prone to inflexibility and shortcuts. This talk will review the comparative cognition approach to AI evaluation and the benefits of robust cognitive testing of AI both to understanding AI itself, but also for exploring biological intelligence.
About the Speaker
Dr Lucy Cheke is a Lecturer in the University of Cambridge and Principal Investigator in the Cognition and Motivated Behaviour Lab at the same University.
Jessica Hamrick (DeepMind)
Understanding and Improving Model-Based Deep Reinforcement Learning
Model-based planning is often thought to be necessary for deep, careful reasoning and generalization in artificial agents. While recent successes of model-based reinforcement learning (MBRL) with deep function approximation have strengthened this hypothesis, the resulting diversity of model-based methods has also made it difficult to track which components drive success and why. In this talk, I will discuss a line of research from the past few years that has aimed to better understand, and subsequently improve, model-based learning and generalization. First, planning is incredibly useful during an agent's training and supports improved data collection and a more powerful learning signal. However, it is only useful for decisions made in the moment under certain circumstances---counter to our (and many others') intuitions! Second, we can substantially improve procedural generalization of model-based agents by incorporating self-supervised learning into the agent's architecture. Finally, we can also improve transfer to novel tasks by leveraging an initial unsupervised exploration phase, which allows for learning transferrable knowledge both in the policy and the world model.
About the Speaker
Dr. Jessica Hamrick is a Staff Research Scientist at DeepMind, where she studies how to build machines that can flexibly build and deploy models of the world. Her work combines insights from cognitive science with structured relational architectures, model-based deep reinforcement learning, and planning. In addition to her work in AI, Dr. Hamrick has contributed to various open-source scientific computing projects including Jupyter and psiTurk. Dr. Hamrick received her PhD in Psychology in 2017 from the University of California, Berkeley, and her BS and MEng in Computer Science in 2012 from the Massachusetts Institute of Technology.
Alessandro Lazaric (Meta AI)
Understanding Incremental Unsupervised Exploration for Goal-based RL
Recording - Release waiting for final approval
One of the key features of intelligent beings is the capacity to explore and discovery an unknown environment and to progressively learn how to control it. This process is not driven by an explicit reward and it may unfold in a completely unsupervised way. In this talk I will propose a formalization of unsupervised discovery and exploration as the process of incrementally learning policies that reach goals of increasing difficulty. The resulting goal-based policy then allows the agent to solve any goal-reaching task at downstream time with no additional learning or planning. I will illustrate algorithmic principles, theoretical guarantees, and preliminary empirical results that could lay the foundations for designing agents that can efficiently learn in open-ended environments.
References:
On unsupervised exploration:
Adaptive Multi-Goal Exploration; AISTATS 2022
Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal Reaching; ICLR 2022
A Provably Efficient Sample Collection Strategy for Reinforcement Learning; NeurIPS 2021
Improved Sample Complexity for Incremental Autonomous Exploration in MDPs; NeurIPS 2020
On exploration for goal-based RL:
Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret; NeurIPS 2021
Sample Complexity Bounds for Stochastic Shortest Path with a Generative Model; A
About the Speaker
Alessandro is a research scientist at Meta AI (/FAIR), where he has been since 2017. Prior to working at Meta, he completed a PhD at Politecnico di Milano and worked as a researcher at INRIA Lille. His main research topic is reinforcement learning, with extensive contributions on both the theoretical and algorithmic aspects of RL. In the last ten years he has studied the exploration-exploitation dilemma both in the multi-armed bandit and reinforcement learning framework, notably on the problems of regret minimization, best-arm identification, pure exploration, and hierarchical RL.
2022 Winter Series
Elise van der Pol (Microsoft Research)
Symmetries in Single and Multi-agent Learning and AI4Science
In this talk, I will discuss our work on symmetry and structure in single and multi agent reinforcement learning. I will first discuss MDP Homomorphic Networks (NeurIPS 2020), a class of networks that ties transformations of observations to transformations of decisions. Such symmetries are ubiquitous in deep reinforcement learning, but often ignored in earlier approaches. Enforcing this prior knowledge into policy and value networks allows us to reduce the size of the solution space, a necessity in problems with large numbers of possible observations. I will showcase the benefits of our approach on agents in virtual environments. Building on the foundations of MDP Homomorphic Networks, I will also discuss our recent multi-agent works, Multi-Agent MDP Homomorphic Networks (ICLR 2022) and Equivariant Networks for Zero-Shot Coordination (NeurIPS 2022), which consider symmetries in multi-agent systems. This forms a basis for my vision for reinforcement learning for complex virtual environments, as well as for problems with intractable search spaces. Finally, I will briefly discuss AI4Science.
About the Speaker
Elise van der Pol is a Senior Researcher at Microsoft Research AI4Science Amsterdam, working on reinforcement learning and deep learning for molecular simulation. Additionally, she works on symmetry, structure, and equivariance in single and multi-agent reinforcement learning and machine learning.
Before joining MSR, she did a PhD in the Amsterdam Machine Learning Lab, working with Max Welling (UvA), Frans Oliehoek (TU Delft), and Herke van Hoof (UvA). During her PhD, she spent time in DeepMind’s multi-agent team. Elise was an invited speaker at the BeneRL 2022 workshop and the Self-Supervision for Reinforcement Learning workshop at ICLR 2021. She was also a co-organizer of the workshop on Ecological/Data-Centric Reinforcement Learning at NeurIPS 2021.
12th October 2022
Alhussein Fawzi (DeepMind)
Discovering faster matrix multiplication algorithms with reinforcement learning
Improving the efficiency of algorithms for fundamental computational tasks such as matrix multiplication can have widespread impact, as it affects the overall speed of a large amount of computations. The automatic discovery of algorithms using machine learning offers the prospect of reaching beyond human intuition and outperforming the current best human-designed algorithms. In this talk I'll present AlphaTensor, our reinforcement learning agent based on AlphaZero for discovering efficient and provably correct algorithms for the multiplication of arbitrary matrices. AlphaTensor discovered algorithms that outperform the state-of-the-art complexity for many matrix sizes. Particularly relevant is the case of 4 × 4 matrices in a finite field, where AlphaTensor's algorithm improves on Strassen's two-level algorithm for the first time since its discovery 50 years ago. I'll present our problem formulation as a single-player game, the key ingredients that enable tackling such difficult mathematical problems using reinforcement learning, and the flexibility of the AlphaTensor framework.
About the Speaker
Alhussein Fawzi is a Research Scientist in the Science team at DeepMind, where he leads the algorithmic discovery efforts. He is broadly interested in using machine learning to unlock new scientific discoveries. He obtained his PhD in machine learning and computer vision from EPFL in 2016.
12th October 2022
Tim Rocktäschel (DeepMind)
The NetHack Learning Environment and Its Challenges for Open-Ended Learning
Progress in Reinforcement Learning (RL) methods goes hand-in-hand with the development of challenging environments that test the limits of current approaches. While existing RL environments are either sufficiently complex or based on fast simulation, they are rarely both these things. Moreover, research in RL has predominantly focused on environments that can be approached by tabula rasa learning, i.e., without agents requiring transfer of any domain or world knowledge outside of the simulated environment. I will talk about the NetHack Learning Environment (NLE), a scalable, procedurally generated, stochastic, rich, and challenging environment for research based on the popular single-player terminal-based rogue-like game, NetHack. We argue that NetHack is sufficiently complex to drive long-term research on problems such as auto-curriculum learning, exploration, planning, skill acquisition, goal-driven learning, novelty search, and language-conditioned RL, while dramatically reducing the computational resources required to gather a large amount of experience.
About the Speaker
Tim is the Open-Endedness Team Lead at DeepMind, an Associate Professor at the Centre for Artificial Intelligence in the Department of Computer Science at University College London (UCL), and a Scholar of the European Laboratory for Learning and Intelligent Systems (ELLIS). Prior to that, he was a Manager and Research Scientist at Facebook AI Research (FAIR) London, a Postdoctoral Researcher in Reinforcement Learning at the University of Oxford, a Junior Research Fellow in Computer Science at Jesus College, and a Stipendiary Lecturer in Computer Science at Hertford College. Tim obtained his Ph.D. from UCL under the supervision of Sebastian Riedel, and he was awarded a Microsoft Research Ph.D. Scholarship in 2013 and a Google Ph.D. Fellowship in 2017.
12th October 2022
Felix Hill (DeepMind)
How knowing language can make general AI systems smarter
Having and using language makes humans as a species better learners and better able to solve hard problems. I'll present three studies that demonstrate how this is also the case for artificial models of general intelligence. In the first, I show that agents with access to visual and linguistic semantic knowledge explore their environment more effectively than non-linguistic agents, enabling them to learn more about the world around them. In the second, I demonstrate how an agent embodied in a simulated 3D world can be enhanced by learning from explanations -- answers to the question "why?" expressed in language. Agents that learn from explanations solve harder cognitive challenges than those trained from reinforcement learning alone, and can also better learn to make interventions in order to uncover the causal structure of their world. Finally, I'll present evidence that the skewed and bursty distribution of natural language may explain how large language models can be prompted to rapidly acquire new skills or behaviours. Together with other recent literature, this suggests that modelling language may make a neural network better able to acquire new cognitive capacities quickly, even when those capacities are not necessarily explicitly linguistic.
About the Speaker
Felix is a Research Scientist at DeepMind where he leads a team focusing on the relationship between natural language and general intelligence. His work combines insights from Cognitive Science, Neuroscience and Linguistics in working towards scientifically and practically useful models of human cognition and behaviour.
14th September 2022
Sam Devlin (Microsoft Research)
Towards Ad-Hoc Teamwork for Improved Player Experience
Collaborative multi-agent reinforcement learning research often makes two key assumptions: (1) we have control of all agents on the team; and (2) maximising team reward is all you need. However, to enable human-AI collaboration, we need to break both of these assumptions. In this talk I will formalise the problem of ad-hoc teamwork and present our proposed approach to meta-learn policies robust to a given set of possible future collaborators. Then talk about recent work on modelling human play, showing reward maximisation may not be sufficient when trying to entertain billions of players worldwide.
About the Speaker
Sam is a Principal Researcher in the Deep Reinforcement Learning for Games group at Microsoft Research Cambridge. He received his PhD on multi-agent reinforcement learning in 2013 from the University of York; was a postdoc from 2013 to 2015, working on game analytics; and then was on the faculty from 2016 until joining Microsoft in 2018. He has published more than 60 papers on reinforcement learning and game AI in leading academic venues and presents regularly at games industry events including Develop and the Game Developers Conference (GDC).
16th June 2022
David Abel (DeepMind)
On the Expressivity of Markov Reward
Reward is the driving force for reinforcement-learning agents. In this talk, I will present our recent NeurIPS paper that explores the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of “task” that might be of interest: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while Markov reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. I conclude by summarizing recent follow up work that studies alternatives for enriching the expressivity of reward.
About the Speaker
David Abel is a Research Scientist at DeepMind in London. Before that, he completed his Ph.D in computer science and Masters in philosophy at Brown University, advised by Michael Littman (CS) and Josh Schechter (philosophy).
26th May 2022
From Solving Narrow Tasks to Learning General Skills
In the past few years, deep reinforcement learning (RL) has led to impressive achievements in games and robotics. However, current state-of-the-art RL methods struggle to generalize to scenarios they haven’t experienced during training. In this talk, I will show how we can leverage diverse data and a minimal set of inductive biases to generalize to new task instances. First, I will discuss how we can use data augmentation to learn policies which are invariant to task-irrelevant changes in the observations. Then, I will show how we can generalize to new task instances with unseen states and layouts by decoupling the representation of the policy and value function. And finally, I will briefly describe how we can quickly adapt to new dynamics by learning a value function for a family of behaviors and environments.
About the Speaker
Roberta is a research scientist at Meta AI / FAIR. Previously, she did her PhD in computer science at NYU, advised by Rob Fergus. Her research focuses on designing machine learning algorithms that can make robust sequential decisions in complex environments. In particular, Roberta works in the area of deep reinforcement learning, with a focus on generalization, adaptation, continual, and open-ended learning. During her PhD, she spent time as an intern at DeepMind, Microsoft Research, and Facebook AI Research. Roberta also holds a B.A. in Astrophysics from Princeton University, where she worked on theoretical cosmology and supernovae simulations.
21st April 2022
High-Dimensional Black-Box Optimisation in Small Data Regimes
Many problems in science and engineering can be viewed as instances of black-box optimisation over high-dimensional (structured) input spaces. Applications are ubiquitous, including arithmetic expression formation from formal grammars and property-guided molecule generation, to name a few. Machine learning (ML) has shown promising results in many such problems (sometimes) leading to state-of-the-art results. Abide those successes, modern ML techniques are data-hungry, requiring hundreds of thousands if not millions of labelled data. Unfortunately, many real-world applications do not enjoy such a luxury -- it is challenging to acquire millions of wet-lab experiments when designing new molecules.
This talk will elaborate on novel techniques we developed for high-dimensional Bayesian optimisation (BO), capable of efficiently resolving such data bottlenecks. Our methods combine ideas from deep metric learning with BO to enable sample efficient low-dimensional surrogate optimisation. We provide theoretical guarantees demonstrating vanishing regrets with respect to the true high-dimensional optimisation problem. Furthermore, in a set of experiments, we confirm the effectiveness of our techniques in reducing sample sizes by acquiring state-of-the-art logP molecule values utilising only 1% labels compared to previous SOTA.
About the Speaker
Haitham leads the reinforcement learning team at Huawei technologies Research & Development UK and is an Honorary Lecturer at UCL. Prior to Huawei, Haitham led the reinforcement learning and tuneable AI team at PROWLER.io, where he contributed to their technology in finance and logistics. Prior to joining PROWLER.io, Haitham was an Assistant Professor in the Computer Science Department at the American University of Beirut (AUB). Before joining the AUB, Haitham was a postdoctoral research associate in the Department of Operational Research and Financial Engineering (ORFE) at Princeton University. Prior to Princeton, he conducted researcher in lifelong machine learning while being employed as a postdoctoral researcher at the University of Pennsylvania. Being a former member of the General Robotics Automation Sensing and Perception (GRASP) lab, he also contributed to the application of machine learning to robotics. His primary research interests lie in the field of statistical machine learning and artificial intelligence, focusing on Bayesian optimisation, probabilistic modelling and reinforcement learning. He is also interested in learning using massive amounts of data over extended time horizons – a property common to "Big-Data" problems. His research also spans different areas of control theory and nonlinear dynamical systems, as well as social networks and distributed optimization.
17th March 2022
Towards Multi-agent Emergent Communication as a Building Block of Human-centric AI
The ability to cooperate through language is a defining feature of humans. As the perceptual, motory and planning capabilities of deep artificial networks increase, researchers are studying whether they can also develop a shared language to interact. In this talk, I will highlight recent advances in this field but also common headaches (or perhaps limitations) with respect to experimental setup and evaluation of emergent communication. Towards making multi-agent communication a building block of human-centric AI, and by drawing from my own recent work, I will discuss approaches on making emergent communication relevant for human-agent communication in natural language.
About the Speaker
Angeliki Lazaridou is a Staff Research Scientist at DeepMind. She obtained her PhD from the University of Trento, where she worked on predictive grounded language learning. At DeepMind, she has worked on interactive methods for language learning that rely on multi-agent communication as a means of alleviating the use of supervised language data. More recently, she has focused on understanding (and improving) the temporal generalization of language models.
17th February 2022
Zero-Shot Coordination and Off-Belief Learning
There has been a large body of work studying how agents can learn communication protocols in decentralized settings, using their actions to communicate information. Surprisingly little work has studied how this can be prevented, yet this is a crucial prerequisite from a human-AI coordination and AI-safety point of view.
The standard problem setting in Dec-POMDPs is self-play, where the goal is to find a set of policies that play optimally together. Policies learned through self-play may adopt arbitrary conventions and implicitly rely on multi-step reasoning based on fragile assumptions about other agents' actions and thus fail when paired with humans or independently trained agents at test time. To address this, we present off-belief learning (OBL). At each timestep OBL agents follow a policy pi_1 that is optimized assuming past actions were taken by a given, fixed policy, pi_0, but assuming that future actions will be taken by pi_1. When pi_0 is uniform random, OBL converges to an optimal policy that does not rely on inferences based on other agents' behavior.
OBL can be iterated in a hierarchy, where the optimal policy from one level becomes the input to the next, thereby introducing multi-level cognitive reasoning in a controlled manner. Unlike existing approaches, which may converge to any equilibrium policy, OBL converges to a unique policy, making it suitable for zero-shot coordination (ZSC).
OBL can be scaled to high-dimensional settings with a fictitious transition mechanism and shows strong performance in both a toy-setting and the benchmark human-AI & ZSC problem Hanabi.
About the speaker
Jakob Foerster started as an Associate Professor at the department of engineering science at the University of Oxford in the fall of 2021. During his PhD at Oxford he helped bring deep multi-agent reinforcement learning to the forefront of AI research and interned at Google Brain, OpenAI, and DeepMind. After his PhD he worked as a research scientist at Facebook AI Research in California, where he continued doing foundational work. He was the lead organizer of the first Emergent Communication workshop at NeurIPS in 2017, which he has helped organize ever since and was awarded a prestigious CIFAR AI chair in 2019.
His past work addresses how AI agents can learn to cooperate and communicate with other agents, most recently he has been developing and addressing the zero-shot coordination problem setting, a crucial step towards human-AI coordination.
His work has been cited over 5000 times, with an h-index of 29 (Google Scholar page).