Invited Speakers

2024 Spring Series

10th June 2024 Register here!

Sebastian Risi (IT University of Copenhagen)

Exploring Alternative Bio-Inspired Neural Building Blocks for Fast Reinforcement Learning

Despite all their recent advances, current AI methods are still brittle and fail when confronted with unexpected situations. Biological intelligent systems, on the other hand, can rapidly adapt and display an inherent level of resilience. While being inspired by the brain, the current artificial neural network paradigm abstracted away many of the properties that could turn out essential in our goal to create a more general artificial intelligence. In this talk, I'll present some of our work in exploring alternative neural building blocks. For example, it is possible to allow completely random networks to adapt to morphological damage in a robot in only a few trials through meta-learned local plasticity rules. Likewise, evolving different acitvation functions in random neural networks alone, enables them to master different reinforcement learning tasks, challenging our understanding of which ingredients to include in our neural networks. Finally, I will present our current work on Neural Developmental Program approach, in which we learn to grow artificial neural networks through a developmental process that mirrors key properties of embryonic development in biological organisms. The talk concludes with future research opportunities and challenges that we need to address to best capitalize on the same ideas that allowed biological intelligence to strive.

About the Speaker

Sebastian Risi is a Full Professor at the IT University of Copenhagen where he directs the Creative AI Lab, and co-directs the Robotics, Evolution and Art Lab (REAL).  Before joining ITU, he did a postdoc at Cornell University and before that, he obtained a Ph.D. from the University of Central Florida. As one of the pioneers in the emerging field of collective intelligence for deep learning, he investigates how we can make current AI approaches more robust and adaptive. He has won several international scientific awards, including multiple best paper awards, an ERC Consolidator Grant in 2022, the Distinguished Young Investigator in Artificial Life 2018 award, a Google Faculty Research Award in 2019, and an Amazon Research Award in 2020. His interdisciplinary work has been published in major machine learning, artificial life, and human-computer interaction conferences, including AAAI, NeurIPS, ICLR, Nature Machine Intelligence, ALIFE, GECCO, and CHI.

1st May 2024 

Jake Bruce (DeepMind)

Genie: Generative Interactive Environments

We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future.

About the Speaker

Jake has a background in deep learning for robot navigation from Queensland University of Technology in Brisbane, Australia. He has been working on reinforcement learning and large-scale deep learning at Google DeepMind since 2018, and has been involved in projects on imitation learning in NetHack, generalist agents via Gato, and most recently generative world models in Genie.

8th April 2024 

Eugene Vinitsky (NYU / Apple)

Real-world reinforcement learning in multi-agent systems

We investigate how multi-agent learning can enable safe deployment and evaluation of autonomous systems operating in safety-critical, mixed human-robot settings. Using a case study of a 100-vehicle, real-world deployment of RL-based traffic-smoothing autonomous vehicles (AVs), we discuss the challenges of estimating when a controller will successfully bridge the sim-to-real gap. We then discuss our work on building human-like, capable simulated agents using regularized self-play techniques. Finally, we discuss some of the challenges of MARL at scale and the new simulators we are designing to address them.

About the Speaker

Eugene Vinitsky is an assistant professor in Transportation Engineering  at NYU, a member of the C2SMARTER consortium on congestion reduction,  and a part-time research scientist at Apple. He works primarily on  multi-agent learning with a focus on its potential use in transportation  systems and robotics. At UC Berkeley, where he was advised by Alexandre Bayen, he received his PhD in controls engineering and received an MS and BS in physics from UC Santa Barbara and Caltech respectively. During his PhD he spent time at DeepMind, Tesla Autopilot, and FAIR.

13th March 2024

Marta Garnelo (Google DeepMind)

Exploring the Space of Key-Value-Query Models with Intention

Attention-based models have been a key element of many recent breakthroughs in deep learning. Two key components of Attention are the structure of its input (which consists of keys, values and queries) and the computations by which these three are combined. In this paper we explore the space of models that share said input structure but are not restricted to the computations of Attention. We refer to this space as Keys-Values-Queries (KVQ) Space. Our goal is to determine whether there are any other stackable models in KVQ Space that Attention cannot efficiently approximate, which we can implement with our current deep learning toolbox and that solve problems that are interesting to the community. Maybe surprisingly, the solution to the standard least squares problem satisfies these properties. A neural network module that is able to compute this solution not only enriches the set of computations that a neural network can represent but is also provably a strict generalisation of Linear Attention. Even more surprisingly the computational complexity of this module is exactly the same as that of Attention, making it a suitable drop in replacement. With this novel connection between classical machine learning (least squares) and modern deep learning (Attention) established we justify a variation of our model which generalises regular Attention in the same way. Both new modules are put to the test an a wide spectrum of tasks ranging from few-shot learning to policy distillation that confirm their real-worlds applicability.

About the Speaker

Marta is a senior research scientist at DeepMind where she has primarily worked on deep generative models and meta learning. As part of this research she has been involved in developing Generative Query Networks and led the work on Neural Processes. In addition to generative models her recent interests have expanded to multi-agent systems and game theory. Prior to DeepMind Marta obtained her PhD from Imperial College London.

29th January 2024 

Leonard Bauersfeld (University of Zurich)

Champion-level drone racing using deep reinforcement learning

First-person view (FPV) drone racing is a televised sport in which professional competitors pilot high-speed quadcopters through a 3D circuit. The pilots see the environment from the perspective of their drone by means of an onboard camera video-stream. In this talk I will explain Swift, an autonomous system that can race quadcopters at the level of the human world champions. The system combines deep reinforcement learning (RL) in simulation with data collected in the physical world. Our Swift drone competed against three human champions, including the world champions of two international leagues, in real-world head-to-head races and won several races against each of the human champions. The autonomous drone demonstrated the fastest recorded race time.

About the Speaker

Leonard Bauersfeld did his Masters at ETH Zurich  in "Robotics, Systems, and Control" where he graduated with distinction in 2021. Currently he is a PhD Student in Robotics at the University of Zurich working in the "Robotics and Perception Group" under the lead of roboticist Davide Scaramuzza.  He works on drone modeling, agile vision-based flight and novel machine learning approaches to push the frontiers of autonomous UAV navigation. He was part of the team that impressively beat the world champions of drone racing in a fair head-to-head race with an autonomous drone. Besides working on drones, he is a photographer and enjoys taking pictures of nature as well as far-away astronomical objects, such as nebulas and galaxies.

17th January 2024 

Katja Hofmann (Microsoft Research)

Generative models for video games 

Developing agents capable of modeling complex environments and human behaviors within them is a key goal of artificial intelligence research. Progress towards this goal has exciting potential for applications in video games, from new tools that empower game developers to realize new creative visions, to enabling new kinds of immersive player experiences. This talk focuses on recent advances of my team at Microsoft Research towards scalable machine learning architectures that effectively capture human gameplay data.

In the first part of my talk, I will focus on diffusion models as generative models of human behavior. Previously shown to have impressive image generation capabilities, I present insights that unlock applications to imitation learning for sequential decision making [1]. In the second part of my talk, I discuss a recent project taking ideas from language modeling to build a generative sequence model of an Xbox game [2].

[1] Imitating human behaviour with diffusion models, Pearce et al., ICLR 2023,

[2] Tales of Scaling Up Generative AI for Video Games, MSR Cambridge Game Intelligence team, under submission.

About the Speaker

Katja Hofmann is a Senior Principal Researcher at Microsoft Research. She leads a team that focuses on modern machine learning and reinforcement learning for Games, with the mission to advance the state of the art in sequential decision making, driven by current and future applications in video games. She and her team share the belief that games will drive a transformation of how people interact with AI technology. Her long-term goal is to develop systems that learn to collaborate with people, to empower their users and help solve complex real-world problems.

2023 Winter Series

6th December 2023

Nathan Grinsztajn & Tristan Kalloniatis (InstaDeep)

Winner takes it all: a new perspective on multi-trials RL for combinatorial optimization. 

Applications and Scaling of RL for Industry.

Winner takes it all: a new perspective on multi-trials RL for combinatorial optimization.

Applying reinforcement learning (RL) to combinatorial optimization problems is attractive as it removes the need for expert knowledge or pre-solved instances. However, it is unrealistic to expect an agent to solve these (often NP-)hard problems in a single shot at inference due to their inherent complexity. Thus, leading approaches often implement multi-trials strategies, from stochastic sampling and beam-search to explicit fine-tuning.

We argue for the benefits of anticipating these multi-trials strategies at train time using discrete or continuous populations of diverse and complementary policies, based on two recent papers:

Winner Takes It All: Training Performant RL Populations for Combinatorial Optimization (Grinsztajn et al., Neurips 2023)

Combinatorial Optimization with Policy Adaptation using Latent Space Search (Chalumeau et al., Neurips 2023)

Instead of relying on a predefined or hand-crafted notion of diversity, these two approaches make use of specific training schemes that induce an unsupervised specialization targeted solely at maximizing the performance of the population, leading to state of the art RL results on four popular NP-hard problems: traveling salesman, capacitated vehicle routing, 0-1 knapsack, and job-shop scheduling. Generally, these frameworks can be applied to any reinforcement learning problem that can be attempted multiple times. 

About the Speaker - Nathan Grinsztajn

Nathan holds a PhD in Reinforcement Learning for Combinatorial Optimization from Univ. Lille and Inria in France. He joined InstaDeep 1 year ago as a research scientist in RL working primarily on CO and sequence modeling problems.

Applications and Scaling of RL for Industry.

This talk will outline some of the ways in which InstaDeep tries to apply AI/RL to solve industry scale problems, particularly those with a Combinatorial Optimisation flavour. I will also give some examples of the collaboration between research and engineering to scale up the compute to run larger experiments while making efficient use of hardware.

About the Speaker - Tristan Kalloniatis

Following a PhD in Algebraic Number Theory, I spent 4 years working as a quant before joining InstaDeep as a Research Engineer at the end of 2020. Since then, I have been working primarily on applying Reinforcement Learning to solve problems in Electronic Circuit Design. In my spare time I enjoy gym, music, and expanding my already excessive Rubik's Cube collection. 

1st November 2023

Ilija Bogunovic (UCL)

From Data to Confident Decisions: Robust and Efficient Algorithmic Decision Making

Whether in biological design, causal discovery, material production, or physical sciences, one often faces decisions regarding which new data to collect or experiments to perform. There is thus a pressing need for adaptive algorithms that make confident decisions about data collection processes and enable efficient and robust learning. In this talk, I will delve into the fundamental questions related to these requirements. How can we quantify uncertainty and efficiently learn to discover robust solutions? How can we design learning-based decision-making methods that are resistant to outliers, data shifts, and attacks? How can we utilize the inherent problem structure to achieve efficient learning? In light of the previous questions, I will examine the core statistical and robustness aspects through the perspective of Bayesian optimization and reinforcement learning. I will highlight the shortcomings of standard methods and present novel algorithms that come with strong theoretical guarantees. I will also showcase their robust performance in various applications by utilizing real-world data and popular benchmarks and finally map the main avenues for future research.  

About the Speaker

Ilija Bogunovic is a Lecturer (assistant professor) in the Electrical Engineering Department at the University College London. Before that, he was a postdoctoral researcher in the Machine Learning Institute and Learning and Adaptive Systems group at ETH Zurich. He received a Ph.D. in Computer and Communication Sciences from EPFL and an MSc in Computer Science from ETH Zurich. His work has been recognized through a Google Research Scholar Program Award and EPSRC New Investigator Award.

His research interests are centered around data-efficient interactive machine learning, reinforcement learning, reliability and robustness considerations in data-driven algorithms and are motivated by a range of emerging real-world applications. He co-founded a recurring ReALML ICML workshop on “Adaptive Experimental Design and Active Learning in the Real World".

11th October 2023

Razvan Pascanu (Google DeepMind)

Resurrecting Recurrent Neural Networks for Long Sequences

In this talk I will focus on State Space Models (SSM), a recently introduced family of sequential models and specifically discuss the relationship between SSMs and recurrent neural networks. I will start with a short history of architecture design for language modelling, which I will use as a motivating task. This will allow me to provide some insights in the evolution of RNN architectures, and why some choices behind the SSM architecture seemed counter-intuitive. Most of the talk will focus on introducing the Linear Recurrent Unit architecture, explaining the role of the various modifications from traditional non-linear recurrent models. 

The talk will conclude with some open questions about the role recurrent architectures could or should play, and potentially the less well understood relationship between these SSM models and transformer like architectures. 

About the Speaker

Razvan Pascanu has been a research scientist at Google DeepMind since 2014. Before this, he did his PhD in Montréal with prof. Yoshua Bengio, working on understanding deep networks, recurrent models and optimization. Since he joined DeepMind he has also had significant contributions in deep reinforcement learning, continual learning, meta-learning, graph neural networks as well as continuing his research agenda of understanding deep learning, recurrent models and optimization. Please see his scholar page for specific contributions. He is also actively promoting AI research and education as a main organizer of Conference on Life-long Learning Agents (CoLLAs) , Eastern European Machine Learning Summer School (EEML) and as well as different workshops at NeurIPS, ICML and ICLR.

2023 Spring Series

4th July 2023

Daniel Mankowitz & Andrea Michi  (DeepMind)

AlphaDev: Faster sorting algorithms discovered using deep reinforcement learning

Fundamental algorithms such as sorting or hashing are used trillions of times on any given day1. As demand for computation grows, it has become critical for these algorithms to be as performant as possible. Whereas remarkable progress has been achieved in the past2, making further improvements on the efficiency of these routines has proved challenging for both human scientists and computational approaches. Here we show how artificial intelligence can go beyond the current state of the art by discovering hitherto unknown routines. To realize this, we formulated the task of finding a better sorting routine as a single-player game. We then trained a new deep reinforcement learning agent, AlphaDev, to play this game. AlphaDev discovered small sorting algorithms from scratch that outperformed previously known human benchmarks. These algorithms have been integrated into the LLVM standard C++ sort library3. This change to this part of the sort library represents the replacement of a component with an algorithm that has been automatically discovered using reinforcement learning. We also present results in extra domains, showcasing the generality of the approach. 

About the Speakers

Daniel Mankowitz is a Staff Research Scientist at Google Deepmind, working on solving the key challenges in Reinforcement Learning algorithms that unlock real-world applications at scale. This includes a focus on Reinforcement Learning from Human Feedback (RLHF) in the context of Large Language Models (LLMs). Mankowitz has worked on: code optimization, code generation, video compression, recommender systems, and controlling physical systems such as Heating Ventilation and Air-Conditioning (HVAC), with publications in Nature and Science.

Andrea Michi is a Senior Research Engineer at Google DeepMind working on Reinforcement Learning applications. Michi has worked on a range of domains including renewable forecasting, code optimization, control of physical systems such as Heating Ventilation and Air-Conditioning (HVAC) and the magnetic confinement in a Tokamak. More recently, Michi has focused on Reinforcement Learning from Human Feedback (RLHF) to align Large Language Models (LLMs) to human preferences. 

7th June 2023

Andrew Lampinen (DeepMind)

Passive learning of active causal strategies in agents and language models 

What can be learned about causality and experimentation from passive data? This question is salient given recent successes of passively-trained language models in interactive domains such as tool use. Passive learning is inherently limited. However, we show that purely passive learning can in fact allow an agent to learn generalizable strategies for determining and using causal structures, as long as the agent can intervene at test time. In this talk, I will show empirically that agents trained via passive imitation on expert data can indeed generalize at test time to infer and use causal links which are never present in the training data; these agents can also generalize experimentation strategies to novel variable sets never observed in training. This is possible even in a more complex environment with high-dimensional observations, with the support of natural language explanations. Explanations can even allow passive learners to generalize out-of-distribution from perfectly-confounded training data. Finally, I'll show that language models, trained only on passive next-word prediction, can generalize causal intervention strategies from a few-shot prompt containing examples of experimentation, together with explanations and reasoning. These results highlight the surprising power of passive learning of active causal strategies, and may help to understand the behaviors and capabilities of language models. (

About the Speaker

Andrew Lampinen is a Senior Research Scientist at Google DeepMind. Previously, he completed his PhD at Stanford University, and his BA at UC Berkeley. His work focuses on using methods from cognitive science to analyze AI, and using insights from cognitive science to improve AI, and covers areas ranging from RL agents to language models. He is particularly interested in cognitive flexibility and generalization, and how these abilities are enabled by factors like language, memory, and embodiment. 

24th May 2023

Florian Fuchs (Sony AI)

Outracing champion Gran Turismo drivers with Deep Reinforcement Learning 

Gran Turismo Sophy is a revolutionary superhuman racing agent designed to compete against top Gran Turismo® Sport drivers and elevate their gaming experience.

GT Sophy was trained using novel deep reinforcement learning techniques, including state-of-the-art learning algorithms and training scenarios developed by Sony AI, using Gran Turismo Sport, a real driving simulator, and by leveraging Sony Interactive Entertainment's cloud gaming infrastructure for massive scale training.

About the Speaker

Florian Fuchs is an AI engineer at Sony AI Zurich. His work focuses on applying Reinforcement Learning to interactive and dynamic games in order to enhance the gaming experience and support game developers to unleash their creativity. Florian holds an MSc in computer science with a focus on machine learning at the University of Zurich. After his master thesis, where he first achieved super-human time trial results in the highly realistic racing simulator "Gran Turismo SPORT" using end-to-end Deep Reinforcement Learning, he was then part of the Sony AI team who developed the first racing agents competitive with the world’s best e-sports drivers.

19th April 2023

Lucy Checke (University of Cambridge)

A Comparative Cognition Approach to AI Evaluation

Recording - Release waiting for final approval

Understanding and predicting behaviour has been the business of psychologists for over a century. Within human psychology we can rely to some extent on introspection to understand the underlying drivers of behaviour, but this is less straightforward with animals. The problem of peering inside the "black box" of nonhuman animals shares much with the challenge of understanding the capabilities of AI systems - which exhibit extraordinarily - clever-seeming - behaviour, but are prone to inflexibility and shortcuts. This talk will review the comparative cognition approach to AI evaluation and the benefits of robust cognitive testing of AI both to understanding AI itself, but also for exploring biological intelligence.

About the Speaker

Dr Lucy Cheke is a Lecturer in the University of Cambridge and Principal Investigator in the Cognition and Motivated Behaviour Lab at the same University.

Jessica Hamrick (DeepMind)

Understanding and Improving Model-Based Deep Reinforcement Learning

Model-based planning is often thought to be necessary for deep, careful reasoning and generalization in artificial agents. While recent successes of model-based reinforcement learning (MBRL) with deep function approximation have strengthened this hypothesis, the resulting diversity of model-based methods has also made it difficult to track which components drive success and why. In this talk, I will discuss a line of research from the past few years that has aimed to better understand, and subsequently improve, model-based learning and generalization. First, planning is incredibly useful during an agent's training and supports improved data collection and a more powerful learning signal. However, it is only useful for decisions made in the moment under certain circumstances---counter to our (and many others') intuitions! Second, we can substantially improve procedural generalization of model-based agents by incorporating self-supervised learning into the agent's architecture. Finally, we can also improve transfer to novel tasks by leveraging an initial unsupervised exploration phase, which allows for learning transferrable knowledge both in the policy and the world model. 

About the Speaker

Dr. Jessica Hamrick is a Staff Research Scientist at DeepMind, where she studies how to build machines that can flexibly build and deploy models of the world. Her work combines insights from cognitive science with structured relational architectures, model-based deep reinforcement learning, and planning. In addition to her work in AI, Dr. Hamrick has contributed to various open-source scientific computing projects including Jupyter and psiTurk. Dr. Hamrick received her PhD in Psychology in 2017 from the University of California, Berkeley, and her BS and MEng in Computer Science in 2012 from the Massachusetts Institute of Technology.

Alessandro Lazaric (Meta AI)

Understanding Incremental Unsupervised Exploration for Goal-based RL

Recording - Release waiting for final approval

One of the key features of intelligent beings is the capacity to explore and discovery an unknown environment and to progressively learn how to control it. This process is not driven by an explicit reward and it may unfold in a completely unsupervised way. In this talk I will propose a formalization of unsupervised discovery and exploration as the process of incrementally learning policies that reach goals of increasing difficulty. The resulting goal-based policy then allows the agent to solve any goal-reaching task at downstream time with no additional learning or planning. I will illustrate algorithmic principles, theoretical guarantees, and preliminary empirical results that could lay the foundations for designing agents that can efficiently learn in open-ended environments.


On unsupervised exploration:

Adaptive Multi-Goal Exploration; AISTATS 2022

Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal Reaching; ICLR 2022

A Provably Efficient Sample Collection Strategy for Reinforcement Learning; NeurIPS 2021

Improved Sample Complexity for Incremental Autonomous Exploration in MDPs; NeurIPS 2020

On exploration for goal-based RL:

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret; NeurIPS 2021

Sample Complexity Bounds for Stochastic Shortest Path with a Generative Model; A 

About the Speaker

Alessandro is a research scientist at Meta AI (/FAIR), where he has been since 2017. Prior to working at Meta, he completed a PhD at Politecnico di Milano and worked as a researcher at INRIA Lille. His main research topic is reinforcement learning, with extensive contributions on both the theoretical and algorithmic aspects of RL. In the last ten years he has studied the exploration-exploitation dilemma both in the multi-armed bandit and reinforcement learning framework, notably on the problems of regret minimization, best-arm identification, pure exploration, and hierarchical RL.

2022 Winter Series

Elise van der Pol (Microsoft Research)

Symmetries in Single and Multi-agent Learning and AI4Science

In this talk, I will discuss our work on symmetry and structure in single and multi agent reinforcement learning. I will first discuss MDP Homomorphic Networks (NeurIPS 2020), a class of networks that ties transformations of observations to transformations of decisions. Such symmetries are ubiquitous in deep reinforcement learning, but often ignored in earlier approaches. Enforcing this prior knowledge into policy and value networks allows us to reduce the size of the solution space, a necessity in problems with large numbers of possible observations. I will showcase the benefits of our approach on agents in virtual environments. Building on the foundations of MDP Homomorphic Networks, I will also discuss our recent multi-agent works, Multi-Agent MDP Homomorphic Networks (ICLR 2022) and Equivariant Networks for Zero-Shot Coordination (NeurIPS 2022), which consider symmetries in multi-agent systems. This forms a basis for my vision for reinforcement learning for complex virtual environments, as well as for problems with intractable search spaces. Finally, I will briefly discuss AI4Science. 

About the Speaker

Elise van der Pol is a Senior Researcher at Microsoft Research AI4Science Amsterdam, working on reinforcement learning and deep learning for molecular simulation. Additionally, she works on symmetry, structure, and equivariance in single and multi-agent reinforcement learning and machine learning.

Before joining MSR, she did a PhD in the Amsterdam Machine Learning Lab, working with Max Welling (UvA), Frans Oliehoek (TU Delft), and Herke van Hoof (UvA). During her PhD, she spent time in DeepMind’s multi-agent team. Elise was an invited speaker at the BeneRL 2022 workshop and the Self-Supervision for Reinforcement Learning workshop at ICLR 2021. She was also a co-organizer of the workshop on Ecological/Data-Centric Reinforcement Learning at NeurIPS 2021.

12th October 2022

Alhussein Fawzi (DeepMind)

Discovering faster matrix multiplication algorithms with reinforcement learning

Improving the efficiency of algorithms for fundamental computational tasks such as matrix multiplication can have widespread impact, as it affects the overall speed of a large amount of computations. The automatic discovery of algorithms using machine learning offers the prospect of reaching beyond human intuition and outperforming the current best human-designed algorithms. In this talk I'll present AlphaTensor, our reinforcement learning agent based on AlphaZero for discovering efficient and provably correct algorithms for the multiplication of arbitrary matrices. AlphaTensor discovered algorithms that outperform the state-of-the-art complexity for many matrix sizes. Particularly relevant is the case of 4 × 4 matrices in a finite field, where AlphaTensor's algorithm improves on Strassen's two-level algorithm for the first time since its discovery 50 years ago. I'll present our problem formulation as a single-player game, the key ingredients that enable tackling such difficult mathematical problems using reinforcement learning, and the flexibility of the AlphaTensor framework. 

About the Speaker

Alhussein Fawzi is a Research Scientist in the Science team at DeepMind, where he leads the algorithmic discovery efforts. He is broadly interested in using machine learning to unlock new scientific discoveries. He obtained his PhD in machine learning and computer vision from EPFL in 2016.

12th October 2022

Tim Rocktäschel (DeepMind)

The NetHack Learning Environment and Its Challenges for Open-Ended Learning

Progress in Reinforcement Learning (RL) methods goes hand-in-hand with the development of challenging environments that test the limits of current approaches. While existing RL environments are either sufficiently complex or based on fast simulation, they are rarely both these things. Moreover, research in RL has predominantly focused on environments that can be approached by tabula rasa learning, i.e., without agents requiring transfer of any domain or world knowledge outside of the simulated environment. I will talk about the NetHack Learning Environment (NLE), a scalable, procedurally generated, stochastic, rich, and challenging environment for research based on the popular single-player terminal-based rogue-like game, NetHack. We argue that NetHack is sufficiently complex to drive long-term research on problems such as auto-curriculum learning, exploration, planning, skill acquisition, goal-driven learning, novelty search, and language-conditioned RL, while dramatically reducing the computational resources required to gather a large amount of experience.

About the Speaker

Tim is the Open-Endedness Team Lead at DeepMind, an Associate Professor at the Centre for Artificial Intelligence in the Department of Computer Science at University College London (UCL), and a Scholar of the European Laboratory for Learning and Intelligent Systems (ELLIS). Prior to that, he was a Manager and Research Scientist at Facebook AI Research (FAIR) London, a Postdoctoral Researcher in Reinforcement Learning at the University of Oxford, a Junior Research Fellow in Computer Science at Jesus College, and a Stipendiary Lecturer in Computer Science at Hertford College. Tim obtained his Ph.D. from UCL under the supervision of Sebastian Riedel, and he was awarded a Microsoft Research Ph.D. Scholarship in 2013 and a Google Ph.D. Fellowship in 2017. 

12th October 2022

Felix Hill (DeepMind)

How knowing language can make general AI systems smarter

Having and using language makes humans as a species better learners and better able to solve hard problems. I'll present three studies that demonstrate how this is also the case for artificial models of general intelligence. In the first, I show that agents with access to visual and linguistic semantic knowledge explore their environment more effectively than non-linguistic agents, enabling them to learn more about the world around them. In the second, I demonstrate how an agent embodied in a simulated 3D world can be enhanced by learning from explanations -- answers to the question "why?" expressed in language. Agents that learn from explanations solve harder cognitive challenges than those trained from reinforcement learning alone, and can also better learn to make interventions in order to uncover the causal structure of their world. Finally, I'll present evidence that the skewed and bursty distribution of natural language may explain how large language models can be prompted to rapidly acquire new skills or behaviours. Together with other recent literature, this suggests that modelling language may make a neural network better able to acquire new cognitive capacities quickly, even when those capacities are not necessarily explicitly linguistic. 

About the Speaker

Felix is a Research Scientist at DeepMind where he leads a team focusing on the relationship between natural language and general intelligence. His work combines insights from Cognitive Science, Neuroscience and Linguistics in working towards scientifically and practically useful models of human cognition and behaviour. 

14th September 2022

Sam Devlin (Microsoft Research)

Towards Ad-Hoc Teamwork for Improved Player Experience

Collaborative multi-agent reinforcement learning research often makes two key assumptions: (1) we have control of all agents on the team; and (2) maximising team reward is all you need. However, to enable human-AI collaboration, we need to break both of these assumptions. In this talk I will formalise the problem of ad-hoc teamwork and present our proposed approach to meta-learn policies robust to a given set of possible future collaborators. Then talk about recent work on modelling human play, showing reward maximisation may not be sufficient when trying to entertain billions of players worldwide.

About the Speaker

Sam is a Principal Researcher in the Deep Reinforcement Learning for Games group at Microsoft Research Cambridge. He received his PhD on multi-agent reinforcement learning in 2013 from the University of York; was a postdoc from 2013 to 2015, working on game analytics; and then was on the faculty from 2016 until joining Microsoft in 2018. He has published more than 60 papers on reinforcement learning and game AI in leading academic venues and presents regularly at games industry events including Develop and the Game Developers Conference (GDC). 

16th June 2022

David Abel (DeepMind)

On the Expressivity of Markov Reward

Reward is the driving force for reinforcement-learning agents. In this talk, I will present our recent NeurIPS paper that explores the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of “task” that might be of interest: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while Markov reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. I conclude by summarizing recent follow up work that studies alternatives for enriching the expressivity of reward. 

About the Speaker

David Abel is a Research Scientist at DeepMind in London. Before that, he completed his Ph.D in computer science and Masters in philosophy at Brown University, advised by Michael Littman (CS) and Josh Schechter (philosophy). 

26th May 2022

Roberta Raileanu (Meta AI Research)

From Solving Narrow Tasks to Learning General Skills

In the past few years, deep reinforcement learning (RL) has led to impressive achievements in games and robotics. However, current state-of-the-art RL methods struggle to generalize to scenarios they haven’t experienced during training. In this talk, I will show how we can leverage diverse data and a minimal set of inductive biases to generalize to new task instances. First, I will discuss how we can use data augmentation to learn policies which are invariant to task-irrelevant changes in the observations. Then, I will show how we can generalize to new task instances with unseen states and layouts by decoupling the representation of the policy and value function. And finally, I will briefly describe how we can quickly adapt to new dynamics by learning a value function for a family of behaviors and environments.

About the Speaker

Roberta is a research scientist at Meta AI / FAIR. Previously, she did her PhD in computer science at NYU, advised by Rob Fergus. Her research focuses on designing machine learning algorithms that can make robust sequential decisions in complex environments. In particular, Roberta works in the area of deep reinforcement learning, with a focus on generalization, adaptation, continual, and open-ended learning. During her PhD, she spent time as an intern at DeepMind, Microsoft Research, and Facebook AI Research. Roberta also holds a B.A. in Astrophysics from Princeton University, where she worked on theoretical cosmology and supernovae simulations. 

21st April 2022

Haitham Ammar (Huawei / UCL) 

High-Dimensional Black-Box Optimisation in Small Data Regimes

Many problems in science and engineering can be viewed as instances of black-box optimisation over high-dimensional (structured) input spaces. Applications are ubiquitous, including arithmetic expression formation from formal grammars and property-guided molecule generation, to name a few. Machine learning (ML) has shown promising results in many such problems (sometimes) leading to state-of-the-art results. Abide those successes, modern ML techniques are data-hungry, requiring hundreds of thousands if not millions of labelled data. Unfortunately, many real-world applications do not enjoy such a luxury -- it is challenging to acquire millions of wet-lab experiments when designing new molecules.

This talk will elaborate on novel techniques we developed for high-dimensional Bayesian optimisation (BO), capable of efficiently resolving such data bottlenecks. Our methods combine ideas from deep metric learning with BO to enable sample efficient low-dimensional surrogate optimisation. We provide theoretical guarantees demonstrating vanishing regrets with respect to the true high-dimensional optimisation problem. Furthermore, in a set of experiments, we confirm the effectiveness of our techniques in reducing sample sizes by acquiring state-of-the-art logP molecule values utilising only 1% labels compared to previous SOTA.

About the Speaker

Haitham leads the reinforcement learning team at Huawei technologies Research & Development UK and is an Honorary Lecturer at UCL. Prior to Huawei, Haitham led the reinforcement learning and tuneable AI team at, where he contributed to their technology in finance and logistics. Prior to joining, Haitham was an Assistant Professor in the Computer Science Department at the American University of Beirut (AUB). Before joining the AUB, Haitham was a postdoctoral research associate in the Department of Operational Research and Financial Engineering (ORFE) at Princeton University. Prior to Princeton, he conducted researcher in lifelong machine learning while being employed as a postdoctoral researcher at the University of Pennsylvania. Being a former member of the General Robotics Automation Sensing and Perception (GRASP) lab, he also contributed to the application of machine learning to robotics. His primary research interests lie in the field of statistical machine learning and artificial intelligence, focusing on Bayesian optimisation, probabilistic modelling and reinforcement learning. He is also interested in learning using massive amounts of data over extended time horizons – a property common to "Big-Data" problems. His research also spans different areas of control theory and nonlinear dynamical systems, as well as social networks and distributed optimization.

17th March 2022

Angeliki Lazaridou (DeepMind) 

Towards Multi-agent Emergent Communication as a Building Block of Human-centric AI 

The ability to cooperate through language is a defining feature of humans. As the perceptual, motory and planning capabilities of deep artificial networks increase, researchers are studying whether they can also develop a shared language to interact. In this talk, I will highlight recent advances in this field but also common headaches (or perhaps limitations) with respect to experimental setup and evaluation of emergent communication. Towards making multi-agent communication a building block of human-centric AI, and by drawing from my own recent work, I will discuss approaches on making emergent communication relevant for human-agent communication in natural language.

About the Speaker

Angeliki Lazaridou is a Staff Research Scientist at DeepMind. She obtained her PhD from the University of Trento, where she worked on predictive grounded language learning. At DeepMind, she has worked on interactive methods for language learning that rely on multi-agent communication as a means of alleviating the use of supervised language data. More recently, she has focused on understanding (and improving) the temporal generalization of language models.

17th February 2022

Jakob Foerster (University of Oxford) 

Zero-Shot Coordination and Off-Belief Learning

There has been a large body of work studying how agents can learn communication protocols in decentralized settings, using their actions to communicate information. Surprisingly little work has studied how this can be prevented, yet this is a crucial prerequisite from a human-AI coordination and AI-safety point of view.

The standard problem setting in Dec-POMDPs is self-play, where the goal is to find a set of policies that play optimally together. Policies learned through self-play may adopt arbitrary conventions and implicitly rely on multi-step reasoning based on fragile assumptions about other agents' actions and thus fail when paired with humans or independently trained agents at test time. To address this, we present off-belief learning (OBL). At each timestep OBL agents follow a policy  pi_1 that is optimized assuming past actions were taken by a given, fixed policy, pi_0, but assuming that future actions will be taken by pi_1. When pi_0 is uniform random, OBL converges to an optimal policy that does not rely on inferences based on other agents' behavior.

OBL can be iterated in a hierarchy, where the optimal policy from one level becomes the input to the next, thereby introducing multi-level cognitive reasoning in a controlled manner. Unlike existing approaches, which may converge to any equilibrium policy, OBL converges to a unique policy, making it suitable for zero-shot coordination (ZSC).

OBL can be scaled to high-dimensional settings with a fictitious transition mechanism and shows strong performance in both a toy-setting and the benchmark human-AI & ZSC problem Hanabi.

About the speaker

Jakob Foerster started as an Associate Professor at the department of engineering science at the University of Oxford in the fall of 2021. During his PhD at Oxford he helped bring deep multi-agent reinforcement learning to the forefront of AI research and interned at Google Brain, OpenAI, and DeepMind. After his PhD he worked as a research scientist at Facebook AI Research in California, where he continued doing foundational work. He was the lead organizer of the first Emergent Communication workshop at NeurIPS in 2017, which he has helped organize ever since and was awarded a prestigious CIFAR AI chair in 2019. 

His past work addresses how AI agents can learn to cooperate and communicate with other agents, most recently he has been developing and addressing the zero-shot coordination problem setting, a crucial step towards human-AI coordination.

His work has been cited over 5000 times, with an h-index of 29 (Google Scholar page). 

2022 Spring Series