This talk discusses the evolution of considerations around context in LLMs, including selected highlights from the speaker's work. It traces the journey from early question answering systems to advanced retrieval-augmented deep learning and modern LLMs. Highlights include the challenges with human feedback, procedural knowledge in pretraining, and novel methods for improving model robustness, including prompt sensitivity, the detection of under-trained tokens, the concept of implicit self-correction in-context, and RL for reverse engineering model context.
Max is a researcher at Google DeepMind and co-chair of the DMLR working group at MLCommons, shaping best practices for large-scale model training. His research focuses on language model robustness, complex reasoning, and innovations in dynamic adversarial data generation and benchmarking. He previously led the Command (post-training) modelling team at Cohere, worked at Facebook AI Research and Bloomsbury AI, and was also an adjunct teaching fellow at University College London. His research has been featured in leading global publications, including Wired, Fortune, and MIT Technology Review, has been recognised as one of TIME's best inventions of 2024, and has earned multiple awards at top-tier conferences.
Foundation models are transforming R&D in biology. This talk introduces recent key developments and highlights how applied teams at InstaDeep and BioNTech build on this research to address real-world biomedical challenges.
Alberto Bégué is a senior research engineer at InstaDeep. He works with a team of research engineers and applied research scientists from InstaDeep, and teams from BioNTech's labs to accelerate discoveries in biology with artificial intelligence. He holds master’s degrees in artificial intelligence from Imperial College London and in engineering from Télécom Paris.
In this talk I will share the experiences of creating reasoning models with reinforcement learning, from the perspective of an industry laboratory. I will highlight the do and don’ts, infrastructure challenges, and a post-mortem analysis of the effort.
Albert Jiang is an AI scientist, leading the reasoning team at Mistral AI. He completed his PhD at the University of Cambridge on AI for formal mathematics. He holds an MSc from Oxford and a BA from Cambridge.
Of all the experiences we have in life, face-to-face interaction fills many of our most meaningful moments. How do we simulate a face from core principles, and what drives the vast expressivity it can so effectively communicate? Will we be able to interact with intelligent machines in a face-to-face way that feels natural?
Double-Academy Award winner Dr. Mark Sagar, was co-founder and Former Chief Science Officer of Soul Machines and Director of the Laboratory for Animate Technologies at the Auckland Bioengineering Institute, pioneering the use of Biologically inspired models of cognition and learning in interactive virtual humans, including BabyX, a virtual infant capable of lifelike face to face interaction.
He also previously worked as the Special Projects Supervisor at Weta Digital and Sony Pictures Imageworks and developed technology for the digital characters in blockbusters such “Avatar,” “King Kong,” and “Spiderman 2.”
His pioneering work in computer-generated faces was recognized with two consecutive Scientific and Engineering Oscars in 2010 and 2011.
Mark has a Ph.D. in engineering from the University of Auckland and was a postdoctoral fellow at MIT.  Mark was elected as a fellow of the Royal Society of New Zealand in 2019 and was named New Zealand Innovator of the Year in 2022.
He was a Keynote Speaker at SIGGRAPH 2024 and currently consults on the intersecting worlds of AI, Interactive Human Simulation, Embodied cognition and Facial Animation.
In the past year we have seen a leap in the capabilities of video generation models. It is no surprise that this could be considered the next frontier — videos encompass much of the world we live in and learning from this data could get us ever closer to more generalist agents. However, video generation only scratches the surface of what we can learn from such data. In this talk, I will discuss a few different works that further investigate how we can infer actions, rewards, policies, and even environments from videos alone.
Ashley is a member of technical staff at Runway working on video generation. Her main interests involve developing models that can infer latent actions, rewards, and policies from videos alone. Prior to joining Runway, she was a senior research scientist at Google DeepMind working on reinforcement learning and foundation world models. And before that, she was a research scientist at Uber AI working on training agents from observations. She received a PhD in 2019 from Georgia Tech.
This talk explores the evolving interaction between humans and AI, from experience in Ai in robotics and AI in finance. We will examine AI as a collaborator in problem solving, address the challenges and opportunities to use AI, and give examples and discuss the future of this symbiotic partnership.
Manuela Veloso is Head of AI Research at JPMorganChase and Herbert A. Simon University Professor Emerita in Computer Science at Carnegie Mellon University. Veloso pursues research interests in core AI as AI robot and digital agents with perception, cognition, and action. She further aims at a seamless human-AI symbiotic interaction. Veloso is a member of the National Academy of Engineering for “her contributions to artificial intelligence and its applications to robotics and to the financial domain.” She is a past president of AAAI, and a co-founder of RoboCup. Veloso is a fellow of AAAI, IEEE, ACM, and AAAS. She has a BSc. and MSc. in Electrical and Computer Engineering from IST, an MA in Computer Science from BU, and a Ph.D. in Computer Science from CMU. Veloso has doctorate honorary degrees from the Catholic University of Portugal, University of Bordeaux, ISCTE, and University of Orebro. See www.cs.cmu.edu/~mmv for detailed information.
AI is changing the financial industry, and indeed, the world, at an ever accelerating pace. The current generation of AI models is different from the previous state of the art in important ways. Large transformer models are broadly capable - exhibiting state-of-the-art, and even human-level performance, on many tasks without problem-specific training - and uniquely accessible, allowing non-experts to interact with them using natural language. Nonetheless, they also have important limitations. In this talk, we will explore the current state of the art in financial AI applications, discuss the challenges of AI adoption in the enterprise, and talk about the risks and future directions of AI development. We will then look at our recent paper (https://arxiv.org/abs/2409.10470) about non-convex online bilevel optimization, and discuss applications of the new method to financial time series analysis and hyperparameter optimization.
The talk will conclude with a Q&A session.
Gary is the Head of Quant Technology Strategy in the Office of the CTO at Bloomberg. Prior to taking on this role, he created and headed the company’s Machine Learning Engineering group, leading projects at the intersection of computational linguistics, machine learning and finance, such as sentiment analysis of financial news, market impact indicators, statistical text classification, social media analytics, question answering, and predictive modeling of financial markets.
Prior to joining Bloomberg in 2007, Gary had earned degrees with distinction in physics, mathematics, and computer science from Boston University.
He is engaged in advisory roles with FinTech and Machine Learning startups and has worked at a variety of technology and academic organizations over the last 20 years. In addition to speaking regularly at industry and academic events around the globe, he is a member of the KDD Data Science + Journalism workshop program committee and the advisory board for the AI & Data Science in Trading conference series. He is also an adjunct professor at Columbia University, and a co-organizer of the annual Machine Learning in Finance conference at Columbia University.
Despite great successes, current deep learning methods cannot learn effectively during normal operation, which makes them ill-suited for reinforcement learning or, really, for any general intelligence. In particular, conventional artificial neural networks fail catastrophically in classic supervised learning testbeds, such as ImageNet, when those testbeds are extended to require ongoing learning. In this talk, I argue that this failure is not inherent in neural networks, but just of the algorithms currently used. For example, a simple modification of the standard backpropagation algorithm, known as continual backpropagation, greatly improves performance in continual learning settings. Such results suggest exploring network learning algorithms explicitly designed for continual and reversible change, such as Dynamic deep learning networks, which continually adapt at multiple levels including 1) their weights, 2) their step-size parameters, and 3) their interconnection structure.
Rich Sutton is research scientist at Keen Technologies, professor in the Department of Computing Science at the University of Alberta, chief scientific advisor of the Alberta Machine Intelligence Institute (Amii), and fellow of the Royal Society of London, the Royal Society of Canada, the Association for the Advancement of Artificial Intelligence, Amii, and CIFAR. He received a PhD in computer science from the University of Massachusetts in 1984 and a BA in psychology from Stanford University in 1978. Prior to joining the University of Alberta in 2003, he worked in industry at AT&T Labs and GTE Labs, and in academia at the University of Massachusetts. He helped found DeepMind Alberta in 2017 and worked there until its dissolution in 2023. At the University of Alberta, Sutton founded the Reinforcement Learning and Artificial Intelligence Lab, which now consists of ten principal investigators and about 100 people altogether. Sutton is co-author of the textbook Reinforcement Learning: An Introduction, and his scientific publications have been cited more than 140,000 times. He is also a libertarian, a chess player, and a cancer survivor.
The impressive developments in language and image models have both opened new creative possibilities for artists and surfaced ethical concerns about the impact of generative AI upon the arts. In this talk, I will cover the subject of AI as a creativity support tool, focusing on theatre and live performance with generative AI. I will illustrate the talk with an example of my theater company, Improbotics, that has used AI for improvised comedy since 2016 and has recently engaged the wider public at Edinburgh Festival Fringe. I will also cover socio-technical research at Google DeepMind on participatory AI with artists, including the evaluation of generative AI tools for visual artists or for co-writing screenplays, theatre plays and comedy.
Dr Piotr Mirowski is a Staff Research Scientist at Google DeepMind, a Visiting Researcher and Knowledge Exchange Scholar at Goldsmiths, University of London, and the co-founder and Director of Improbotics. His research on artificial intelligence covers the subjects of reinforcement learning, navigation, weather and climate forecasting, as well as a socio-technical systems approach to human-machine collaboration and to computational creativity. He is the author of over 75 papers that have been published in prestigious journals including Nature. Piotr studied computer science in France at ENSEEIHT Toulouse and obtained his PhD in computer science in 2011 at New York University, with a thesis supervised by Prof. Yann LeCun (Outstanding Dissertation Award, 2011). A trained actor himself (London School of Dramatic Art), Piotr created Improbotics, a theatre company where human actors and robots improvise live comedy performances and investigate the use of AI for artistic human and machine-based co-creation, aiming at bridging the arts and sciences.
Combinatorial Optimization is crucial to numerous real-world applications, yet still presents challenges due to its (NP-)hard nature. Amongst existing approaches, heuristics often offer the best trade-off between quality and scalability, making them suitable for industrial use. While Reinforcement Learning (RL) offers a flexible framework for designing heuristics, its adoption over handcrafted heuristics remains incomplete within industrial solvers. Existing learned methods still lack the ability to adapt to specific instances, failing to fully utilise newly available information within the constraints of the budget. In response, we present MEMENTO, an RL approach that leverages memory to improve the adaptation of neural solvers at inference time.
Felix Chalumeau is a Research Engineer at InstaDeep, where he works on reinforcement learning for combinatorial optimisation, with a focus on diversity and population-based approaches. He has also worked on other diversity-related topics, including neuroevolution, quality diversity, skill discovery and collaborative skills for multi-agent learning, with publications in international ML conferences, including Neurips and ICLR. Prior to InstaDeep, he studied at Imperial College London and Ecole polytechnique.
Despite all their recent advances, current AI methods are still brittle and fail when confronted with unexpected situations. Biological intelligent systems, on the other hand, can rapidly adapt and display an inherent level of resilience. While being inspired by the brain, the current artificial neural network paradigm abstracted away many of the properties that could turn out essential in our goal to create a more general artificial intelligence. In this talk, I'll present some of our work in exploring alternative neural building blocks. For example, it is possible to allow completely random networks to adapt to morphological damage in a robot in only a few trials through meta-learned local plasticity rules. Likewise, evolving different acitvation functions in random neural networks alone, enables them to master different reinforcement learning tasks, challenging our understanding of which ingredients to include in our neural networks. Finally, I will present our current work on Neural Developmental Program approach, in which we learn to grow artificial neural networks through a developmental process that mirrors key properties of embryonic development in biological organisms. The talk concludes with future research opportunities and challenges that we need to address to best capitalize on the same ideas that allowed biological intelligence to strive.
Sebastian Risi is a Full Professor at the IT University of Copenhagen where he directs the Creative AI Lab, and co-directs the Robotics, Evolution and Art Lab (REAL). Before joining ITU, he did a postdoc at Cornell University and before that, he obtained a Ph.D. from the University of Central Florida. As one of the pioneers in the emerging field of collective intelligence for deep learning, he investigates how we can make current AI approaches more robust and adaptive. He has won several international scientific awards, including multiple best paper awards, an ERC Consolidator Grant in 2022, the Distinguished Young Investigator in Artificial Life 2018 award, a Google Faculty Research Award in 2019, and an Amazon Research Award in 2020. His interdisciplinary work has been published in major machine learning, artificial life, and human-computer interaction conferences, including AAAI, NeurIPS, ICLR, Nature Machine Intelligence, ALIFE, GECCO, and CHI.
We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future.
Jake has a background in deep learning for robot navigation from Queensland University of Technology in Brisbane, Australia. He has been working on reinforcement learning and large-scale deep learning at Google DeepMind since 2018, and has been involved in projects on imitation learning in NetHack, generalist agents via Gato, and most recently generative world models in Genie.
We investigate how multi-agent learning can enable safe deployment and evaluation of autonomous systems operating in safety-critical, mixed human-robot settings. Using a case study of a 100-vehicle, real-world deployment of RL-based traffic-smoothing autonomous vehicles (AVs), we discuss the challenges of estimating when a controller will successfully bridge the sim-to-real gap. We then discuss our work on building human-like, capable simulated agents using regularized self-play techniques. Finally, we discuss some of the challenges of MARL at scale and the new simulators we are designing to address them.
Eugene Vinitsky is an assistant professor in Transportation Engineering at NYU, a member of the C2SMARTER consortium on congestion reduction, and a part-time research scientist at Apple. He works primarily on multi-agent learning with a focus on its potential use in transportation systems and robotics. At UC Berkeley, where he was advised by Alexandre Bayen, he received his PhD in controls engineering and received an MS and BS in physics from UC Santa Barbara and Caltech respectively. During his PhD he spent time at DeepMind, Tesla Autopilot, and FAIR.
Attention-based models have been a key element of many recent breakthroughs in deep learning. Two key components of Attention are the structure of its input (which consists of keys, values and queries) and the computations by which these three are combined. In this paper we explore the space of models that share said input structure but are not restricted to the computations of Attention. We refer to this space as Keys-Values-Queries (KVQ) Space. Our goal is to determine whether there are any other stackable models in KVQ Space that Attention cannot efficiently approximate, which we can implement with our current deep learning toolbox and that solve problems that are interesting to the community. Maybe surprisingly, the solution to the standard least squares problem satisfies these properties. A neural network module that is able to compute this solution not only enriches the set of computations that a neural network can represent but is also provably a strict generalisation of Linear Attention. Even more surprisingly the computational complexity of this module is exactly the same as that of Attention, making it a suitable drop in replacement. With this novel connection between classical machine learning (least squares) and modern deep learning (Attention) established we justify a variation of our model which generalises regular Attention in the same way. Both new modules are put to the test an a wide spectrum of tasks ranging from few-shot learning to policy distillation that confirm their real-worlds applicability.
Marta is a senior research scientist at DeepMind where she has primarily worked on deep generative models and meta learning. As part of this research she has been involved in developing Generative Query Networks and led the work on Neural Processes. In addition to generative models her recent interests have expanded to multi-agent systems and game theory. Prior to DeepMind Marta obtained her PhD from Imperial College London.
First-person view (FPV) drone racing is a televised sport in which professional competitors pilot high-speed quadcopters through a 3D circuit. The pilots see the environment from the perspective of their drone by means of an onboard camera video-stream. In this talk I will explain Swift, an autonomous system that can race quadcopters at the level of the human world champions. The system combines deep reinforcement learning (RL) in simulation with data collected in the physical world. Our Swift drone competed against three human champions, including the world champions of two international leagues, in real-world head-to-head races and won several races against each of the human champions. The autonomous drone demonstrated the fastest recorded race time.
Leonard Bauersfeld did his Masters at ETH Zurich in "Robotics, Systems, and Control" where he graduated with distinction in 2021. Currently he is a PhD Student in Robotics at the University of Zurich working in the "Robotics and Perception Group" under the lead of roboticist Davide Scaramuzza. He works on drone modeling, agile vision-based flight and novel machine learning approaches to push the frontiers of autonomous UAV navigation. He was part of the team that impressively beat the world champions of drone racing in a fair head-to-head race with an autonomous drone. Besides working on drones, he is a photographer and enjoys taking pictures of nature as well as far-away astronomical objects, such as nebulas and galaxies.
Developing agents capable of modeling complex environments and human behaviors within them is a key goal of artificial intelligence research. Progress towards this goal has exciting potential for applications in video games, from new tools that empower game developers to realize new creative visions, to enabling new kinds of immersive player experiences. This talk focuses on recent advances of my team at Microsoft Research towards scalable machine learning architectures that effectively capture human gameplay data.
In the first part of my talk, I will focus on diffusion models as generative models of human behavior. Previously shown to have impressive image generation capabilities, I present insights that unlock applications to imitation learning for sequential decision making [1]. In the second part of my talk, I discuss a recent project taking ideas from language modeling to build a generative sequence model of an Xbox game [2].
[1] Imitating human behaviour with diffusion models, Pearce et al., ICLR 2023, https://aka.ms/bc-diffusion
[2] Tales of Scaling Up Generative AI for Video Games, MSR Cambridge Game Intelligence team, under submission.
Katja Hofmann is a Senior Principal Researcher at Microsoft Research. She leads a team that focuses on modern machine learning and reinforcement learning for Games, with the mission to advance the state of the art in sequential decision making, driven by current and future applications in video games. She and her team share the belief that games will drive a transformation of how people interact with AI technology. Her long-term goal is to develop systems that learn to collaborate with people, to empower their users and help solve complex real-world problems.
Applying reinforcement learning (RL) to combinatorial optimization problems is attractive as it removes the need for expert knowledge or pre-solved instances. However, it is unrealistic to expect an agent to solve these (often NP-)hard problems in a single shot at inference due to their inherent complexity. Thus, leading approaches often implement multi-trials strategies, from stochastic sampling and beam-search to explicit fine-tuning.
We argue for the benefits of anticipating these multi-trials strategies at train time using discrete or continuous populations of diverse and complementary policies, based on two recent papers:
Winner Takes It All: Training Performant RL Populations for Combinatorial Optimization (Grinsztajn et al., Neurips 2023)
Combinatorial Optimization with Policy Adaptation using Latent Space Search (Chalumeau et al., Neurips 2023)
Instead of relying on a predefined or hand-crafted notion of diversity, these two approaches make use of specific training schemes that induce an unsupervised specialization targeted solely at maximizing the performance of the population, leading to state of the art RL results on four popular NP-hard problems: traveling salesman, capacitated vehicle routing, 0-1 knapsack, and job-shop scheduling. Generally, these frameworks can be applied to any reinforcement learning problem that can be attempted multiple times.
Nathan holds a PhD in Reinforcement Learning for Combinatorial Optimization from Univ. Lille and Inria in France. He joined InstaDeep 1 year ago as a research scientist in RL working primarily on CO and sequence modeling problems.
This talk will outline some of the ways in which InstaDeep tries to apply AI/RL to solve industry scale problems, particularly those with a Combinatorial Optimisation flavour. I will also give some examples of the collaboration between research and engineering to scale up the compute to run larger experiments while making efficient use of hardware.
Following a PhD in Algebraic Number Theory, I spent 4 years working as a quant before joining InstaDeep as a Research Engineer at the end of 2020. Since then, I have been working primarily on applying Reinforcement Learning to solve problems in Electronic Circuit Design. In my spare time I enjoy gym, music, and expanding my already excessive Rubik's Cube collection.
Whether in biological design, causal discovery, material production, or physical sciences, one often faces decisions regarding which new data to collect or experiments to perform. There is thus a pressing need for adaptive algorithms that make confident decisions about data collection processes and enable efficient and robust learning. In this talk, I will delve into the fundamental questions related to these requirements. How can we quantify uncertainty and efficiently learn to discover robust solutions? How can we design learning-based decision-making methods that are resistant to outliers, data shifts, and attacks? How can we utilize the inherent problem structure to achieve efficient learning? In light of the previous questions, I will examine the core statistical and robustness aspects through the perspective of Bayesian optimization and reinforcement learning. I will highlight the shortcomings of standard methods and present novel algorithms that come with strong theoretical guarantees. I will also showcase their robust performance in various applications by utilizing real-world data and popular benchmarks and finally map the main avenues for future research.
Ilija Bogunovic is a Lecturer (assistant professor) in the Electrical Engineering Department at the University College London. Before that, he was a postdoctoral researcher in the Machine Learning Institute and Learning and Adaptive Systems group at ETH Zurich. He received a Ph.D. in Computer and Communication Sciences from EPFL and an MSc in Computer Science from ETH Zurich. His work has been recognized through a Google Research Scholar Program Award and EPSRC New Investigator Award.
His research interests are centered around data-efficient interactive machine learning, reinforcement learning, reliability and robustness considerations in data-driven algorithms and are motivated by a range of emerging real-world applications. He co-founded a recurring ReALML ICML workshop on “Adaptive Experimental Design and Active Learning in the Real World".
In this talk I will focus on State Space Models (SSM), a recently introduced family of sequential models and specifically discuss the relationship between SSMs and recurrent neural networks. I will start with a short history of architecture design for language modelling, which I will use as a motivating task. This will allow me to provide some insights in the evolution of RNN architectures, and why some choices behind the SSM architecture seemed counter-intuitive. Most of the talk will focus on introducing the Linear Recurrent Unit architecture, explaining the role of the various modifications from traditional non-linear recurrent models.
The talk will conclude with some open questions about the role recurrent architectures could or should play, and potentially the less well understood relationship between these SSM models and transformer like architectures.
Razvan Pascanu has been a research scientist at Google DeepMind since 2014. Before this, he did his PhD in Montréal with prof. Yoshua Bengio, working on understanding deep networks, recurrent models and optimization. Since he joined DeepMind he has also had significant contributions in deep reinforcement learning, continual learning, meta-learning, graph neural networks as well as continuing his research agenda of understanding deep learning, recurrent models and optimization. Please see his scholar page for specific contributions. He is also actively promoting AI research and education as a main organizer of Conference on Life-long Learning Agents (CoLLAs) lifelong-ml.cc , Eastern European Machine Learning Summer School (EEML) www.eeml.eu and www.workshops.eeml.eu as well as different workshops at NeurIPS, ICML and ICLR.
Fundamental algorithms such as sorting or hashing are used trillions of times on any given day1. As demand for computation grows, it has become critical for these algorithms to be as performant as possible. Whereas remarkable progress has been achieved in the past2, making further improvements on the efficiency of these routines has proved challenging for both human scientists and computational approaches. Here we show how artificial intelligence can go beyond the current state of the art by discovering hitherto unknown routines. To realize this, we formulated the task of finding a better sorting routine as a single-player game. We then trained a new deep reinforcement learning agent, AlphaDev, to play this game. AlphaDev discovered small sorting algorithms from scratch that outperformed previously known human benchmarks. These algorithms have been integrated into the LLVM standard C++ sort library3. This change to this part of the sort library represents the replacement of a component with an algorithm that has been automatically discovered using reinforcement learning. We also present results in extra domains, showcasing the generality of the approach.
Daniel Mankowitz is a Staff Research Scientist at Google Deepmind, working on solving the key challenges in Reinforcement Learning algorithms that unlock real-world applications at scale. This includes a focus on Reinforcement Learning from Human Feedback (RLHF) in the context of Large Language Models (LLMs). Mankowitz has worked on: code optimization, code generation, video compression, recommender systems, and controlling physical systems such as Heating Ventilation and Air-Conditioning (HVAC), with publications in Nature and Science. 
Andrea Michi is a Senior Research Engineer at Google DeepMind working on Reinforcement Learning applications. Michi has worked on a range of domains including renewable forecasting, code optimization, control of physical systems such as Heating Ventilation and Air-Conditioning (HVAC) and the magnetic confinement in a Tokamak. More recently, Michi has focused on Reinforcement Learning from Human Feedback (RLHF) to align Large Language Models (LLMs) to human preferences. 
What can be learned about causality and experimentation from passive data? This question is salient given recent successes of passively-trained language models in interactive domains such as tool use. Passive learning is inherently limited. However, we show that purely passive learning can in fact allow an agent to learn generalizable strategies for determining and using causal structures, as long as the agent can intervene at test time. In this talk, I will show empirically that agents trained via passive imitation on expert data can indeed generalize at test time to infer and use causal links which are never present in the training data; these agents can also generalize experimentation strategies to novel variable sets never observed in training. This is possible even in a more complex environment with high-dimensional observations, with the support of natural language explanations. Explanations can even allow passive learners to generalize out-of-distribution from perfectly-confounded training data. Finally, I'll show that language models, trained only on passive next-word prediction, can generalize causal intervention strategies from a few-shot prompt containing examples of experimentation, together with explanations and reasoning. These results highlight the surprising power of passive learning of active causal strategies, and may help to understand the behaviors and capabilities of language models. (https://arxiv.org/abs/2305.16183)
Andrew Lampinen is a Senior Research Scientist at Google DeepMind. Previously, he completed his PhD at Stanford University, and his BA at UC Berkeley. His work focuses on using methods from cognitive science to analyze AI, and using insights from cognitive science to improve AI, and covers areas ranging from RL agents to language models. He is particularly interested in cognitive flexibility and generalization, and how these abilities are enabled by factors like language, memory, and embodiment.
Gran Turismo Sophy is a revolutionary superhuman racing agent designed to compete against top Gran Turismo® Sport drivers and elevate their gaming experience.
GT Sophy was trained using novel deep reinforcement learning techniques, including state-of-the-art learning algorithms and training scenarios developed by Sony AI, using Gran Turismo Sport, a real driving simulator, and by leveraging Sony Interactive Entertainment's cloud gaming infrastructure for massive scale training.
Florian Fuchs is an AI engineer at Sony AI Zurich. His work focuses on applying Reinforcement Learning to interactive and dynamic games in order to enhance the gaming experience and support game developers to unleash their creativity. Florian holds an MSc in computer science with a focus on machine learning at the University of Zurich. After his master thesis, where he first achieved super-human time trial results in the highly realistic racing simulator "Gran Turismo SPORT" using end-to-end Deep Reinforcement Learning, he was then part of the Sony AI team who developed the first racing agents competitive with the world’s best e-sports drivers.
Recording - Release waiting for final approval
Understanding and predicting behaviour has been the business of psychologists for over a century. Within human psychology we can rely to some extent on introspection to understand the underlying drivers of behaviour, but this is less straightforward with animals. The problem of peering inside the "black box" of nonhuman animals shares much with the challenge of understanding the capabilities of AI systems - which exhibit extraordinarily - clever-seeming - behaviour, but are prone to inflexibility and shortcuts. This talk will review the comparative cognition approach to AI evaluation and the benefits of robust cognitive testing of AI both to understanding AI itself, but also for exploring biological intelligence.
Dr Lucy Cheke is a Lecturer in the University of Cambridge and Principal Investigator in the Cognition and Motivated Behaviour Lab at the same University.
Model-based planning is often thought to be necessary for deep, careful reasoning and generalization in artificial agents. While recent successes of model-based reinforcement learning (MBRL) with deep function approximation have strengthened this hypothesis, the resulting diversity of model-based methods has also made it difficult to track which components drive success and why. In this talk, I will discuss a line of research from the past few years that has aimed to better understand, and subsequently improve, model-based learning and generalization. First, planning is incredibly useful during an agent's training and supports improved data collection and a more powerful learning signal. However, it is only useful for decisions made in the moment under certain circumstances---counter to our (and many others') intuitions! Second, we can substantially improve procedural generalization of model-based agents by incorporating self-supervised learning into the agent's architecture. Finally, we can also improve transfer to novel tasks by leveraging an initial unsupervised exploration phase, which allows for learning transferrable knowledge both in the policy and the world model.
Dr. Jessica Hamrick is a Staff Research Scientist at DeepMind, where she studies how to build machines that can flexibly build and deploy models of the world. Her work combines insights from cognitive science with structured relational architectures, model-based deep reinforcement learning, and planning. In addition to her work in AI, Dr. Hamrick has contributed to various open-source scientific computing projects including Jupyter and psiTurk. Dr. Hamrick received her PhD in Psychology in 2017 from the University of California, Berkeley, and her BS and MEng in Computer Science in 2012 from the Massachusetts Institute of Technology.
Recording - Release waiting for final approval
One of the key features of intelligent beings is the capacity to explore and discovery an unknown environment and to progressively learn how to control it. This process is not driven by an explicit reward and it may unfold in a completely unsupervised way. In this talk I will propose a formalization of unsupervised discovery and exploration as the process of incrementally learning policies that reach goals of increasing difficulty. The resulting goal-based policy then allows the agent to solve any goal-reaching task at downstream time with no additional learning or planning. I will illustrate algorithmic principles, theoretical guarantees, and preliminary empirical results that could lay the foundations for designing agents that can efficiently learn in open-ended environments.
References:
On unsupervised exploration:
Adaptive Multi-Goal Exploration; AISTATS 2022
Direct then Diffuse: Incremental Unsupervised Skill Discovery for State Covering and Goal Reaching; ICLR 2022
A Provably Efficient Sample Collection Strategy for Reinforcement Learning; NeurIPS 2021
Improved Sample Complexity for Incremental Autonomous Exploration in MDPs; NeurIPS 2020
On exploration for goal-based RL:
Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret; NeurIPS 2021
Sample Complexity Bounds for Stochastic Shortest Path with a Generative Model; A
Alessandro is a research scientist at Meta AI (/FAIR), where he has been since 2017. Prior to working at Meta, he completed a PhD at Politecnico di Milano and worked as a researcher at INRIA Lille. His main research topic is reinforcement learning, with extensive contributions on both the theoretical and algorithmic aspects of RL. In the last ten years he has studied the exploration-exploitation dilemma both in the multi-armed bandit and reinforcement learning framework, notably on the problems of regret minimization, best-arm identification, pure exploration, and hierarchical RL.
In this talk, I will discuss our work on symmetry and structure in single and multi agent reinforcement learning. I will first discuss MDP Homomorphic Networks (NeurIPS 2020), a class of networks that ties transformations of observations to transformations of decisions. Such symmetries are ubiquitous in deep reinforcement learning, but often ignored in earlier approaches. Enforcing this prior knowledge into policy and value networks allows us to reduce the size of the solution space, a necessity in problems with large numbers of possible observations. I will showcase the benefits of our approach on agents in virtual environments. Building on the foundations of MDP Homomorphic Networks, I will also discuss our recent multi-agent works, Multi-Agent MDP Homomorphic Networks (ICLR 2022) and Equivariant Networks for Zero-Shot Coordination (NeurIPS 2022), which consider symmetries in multi-agent systems. This forms a basis for my vision for reinforcement learning for complex virtual environments, as well as for problems with intractable search spaces. Finally, I will briefly discuss AI4Science.
Elise van der Pol is a Senior Researcher at Microsoft Research AI4Science Amsterdam, working on reinforcement learning and deep learning for molecular simulation. Additionally, she works on symmetry, structure, and equivariance in single and multi-agent reinforcement learning and machine learning.
Before joining MSR, she did a PhD in the Amsterdam Machine Learning Lab, working with Max Welling (UvA), Frans Oliehoek (TU Delft), and Herke van Hoof (UvA). During her PhD, she spent time in DeepMind’s multi-agent team. Elise was an invited speaker at the BeneRL 2022 workshop and the Self-Supervision for Reinforcement Learning workshop at ICLR 2021. She was also a co-organizer of the workshop on Ecological/Data-Centric Reinforcement Learning at NeurIPS 2021.
Improving the efficiency of algorithms for fundamental computational tasks such as matrix multiplication can have widespread impact, as it affects the overall speed of a large amount of computations. The automatic discovery of algorithms using machine learning offers the prospect of reaching beyond human intuition and outperforming the current best human-designed algorithms. In this talk I'll present AlphaTensor, our reinforcement learning agent based on AlphaZero for discovering efficient and provably correct algorithms for the multiplication of arbitrary matrices. AlphaTensor discovered algorithms that outperform the state-of-the-art complexity for many matrix sizes. Particularly relevant is the case of 4 × 4 matrices in a finite field, where AlphaTensor's algorithm improves on Strassen's two-level algorithm for the first time since its discovery 50 years ago. I'll present our problem formulation as a single-player game, the key ingredients that enable tackling such difficult mathematical problems using reinforcement learning, and the flexibility of the AlphaTensor framework.
Alhussein Fawzi is a Research Scientist in the Science team at DeepMind, where he leads the algorithmic discovery efforts. He is broadly interested in using machine learning to unlock new scientific discoveries. He obtained his PhD in machine learning and computer vision from EPFL in 2016.
Progress in Reinforcement Learning (RL) methods goes hand-in-hand with the development of challenging environments that test the limits of current approaches. While existing RL environments are either sufficiently complex or based on fast simulation, they are rarely both these things. Moreover, research in RL has predominantly focused on environments that can be approached by tabula rasa learning, i.e., without agents requiring transfer of any domain or world knowledge outside of the simulated environment. I will talk about the NetHack Learning Environment (NLE), a scalable, procedurally generated, stochastic, rich, and challenging environment for research based on the popular single-player terminal-based rogue-like game, NetHack. We argue that NetHack is sufficiently complex to drive long-term research on problems such as auto-curriculum learning, exploration, planning, skill acquisition, goal-driven learning, novelty search, and language-conditioned RL, while dramatically reducing the computational resources required to gather a large amount of experience.
Tim is the Open-Endedness Team Lead at DeepMind, an Associate Professor at the Centre for Artificial Intelligence in the Department of Computer Science at University College London (UCL), and a Scholar of the European Laboratory for Learning and Intelligent Systems (ELLIS). Prior to that, he was a Manager and Research Scientist at Facebook AI Research (FAIR) London, a Postdoctoral Researcher in Reinforcement Learning at the University of Oxford, a Junior Research Fellow in Computer Science at Jesus College, and a Stipendiary Lecturer in Computer Science at Hertford College. Tim obtained his Ph.D. from UCL under the supervision of Sebastian Riedel, and he was awarded a Microsoft Research Ph.D. Scholarship in 2013 and a Google Ph.D. Fellowship in 2017.
Having and using language makes humans as a species better learners and better able to solve hard problems. I'll present three studies that demonstrate how this is also the case for artificial models of general intelligence. In the first, I show that agents with access to visual and linguistic semantic knowledge explore their environment more effectively than non-linguistic agents, enabling them to learn more about the world around them. In the second, I demonstrate how an agent embodied in a simulated 3D world can be enhanced by learning from explanations -- answers to the question "why?" expressed in language. Agents that learn from explanations solve harder cognitive challenges than those trained from reinforcement learning alone, and can also better learn to make interventions in order to uncover the causal structure of their world. Finally, I'll present evidence that the skewed and bursty distribution of natural language may explain how large language models can be prompted to rapidly acquire new skills or behaviours. Together with other recent literature, this suggests that modelling language may make a neural network better able to acquire new cognitive capacities quickly, even when those capacities are not necessarily explicitly linguistic.
Reward is the driving force for reinforcement-learning agents. In this talk, I will present our recent NeurIPS paper that explores the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of “task” that might be of interest: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while Markov reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. I conclude by summarizing recent follow up work that studies alternatives for enriching the expressivity of reward.
In the past few years, deep reinforcement learning (RL) has led to impressive achievements in games and robotics. However, current state-of-the-art RL methods struggle to generalize to scenarios they haven’t experienced during training. In this talk, I will show how we can leverage diverse data and a minimal set of inductive biases to generalize to new task instances. First, I will discuss how we can use data augmentation to learn policies which are invariant to task-irrelevant changes in the observations. Then, I will show how we can generalize to new task instances with unseen states and layouts by decoupling the representation of the policy and value function. And finally, I will briefly describe how we can quickly adapt to new dynamics by learning a value function for a family of behaviors and environments.
Many problems in science and engineering can be viewed as instances of black-box optimisation over high-dimensional (structured) input spaces. Applications are ubiquitous, including arithmetic expression formation from formal grammars and property-guided molecule generation, to name a few. Machine learning (ML) has shown promising results in many such problems (sometimes) leading to state-of-the-art results. Abide those successes, modern ML techniques are data-hungry, requiring hundreds of thousands if not millions of labelled data. Unfortunately, many real-world applications do not enjoy such a luxury -- it is challenging to acquire millions of wet-lab experiments when designing new molecules.
This talk will elaborate on novel techniques we developed for high-dimensional Bayesian optimisation (BO), capable of efficiently resolving such data bottlenecks. Our methods combine ideas from deep metric learning with BO to enable sample efficient low-dimensional surrogate optimisation. We provide theoretical guarantees demonstrating vanishing regrets with respect to the true high-dimensional optimisation problem. Furthermore, in a set of experiments, we confirm the effectiveness of our techniques in reducing sample sizes by acquiring state-of-the-art logP molecule values utilising only 1% labels compared to previous SOTA.
Haitham leads the reinforcement learning team at Huawei technologies Research & Development UK and is an Honorary Lecturer at UCL. Prior to Huawei, Haitham led the reinforcement learning and tuneable AI team at PROWLER.io, where he contributed to their technology in finance and logistics. Prior to joining PROWLER.io, Haitham was an Assistant Professor in the Computer Science Department at the American University of Beirut (AUB). Before joining the AUB, Haitham was a postdoctoral research associate in the Department of Operational Research and Financial Engineering (ORFE) at Princeton University. Prior to Princeton, he conducted researcher in lifelong machine learning while being employed as a postdoctoral researcher at the University of Pennsylvania. Being a former member of the General Robotics Automation Sensing and Perception (GRASP) lab, he also contributed to the application of machine learning to robotics. His primary research interests lie in the field of statistical machine learning and artificial intelligence, focusing on Bayesian optimisation, probabilistic modelling and reinforcement learning. He is also interested in learning using massive amounts of data over extended time horizons – a property common to "Big-Data" problems. His research also spans different areas of control theory and nonlinear dynamical systems, as well as social networks and distributed optimization.
The ability to cooperate through language is a defining feature of humans. As the perceptual, motory and planning capabilities of deep artificial networks increase, researchers are studying whether they can also develop a shared language to interact. In this talk, I will highlight recent advances in this field but also common headaches (or perhaps limitations) with respect to experimental setup and evaluation of emergent communication. Towards making multi-agent communication a building block of human-centric AI, and by drawing from my own recent work, I will discuss approaches on making emergent communication relevant for human-agent communication in natural language.
Angeliki Lazaridou is a Staff Research Scientist at DeepMind. She obtained her PhD from the University of Trento, where she worked on predictive grounded language learning. At DeepMind, she has worked on interactive methods for language learning that rely on multi-agent communication as a means of alleviating the use of supervised language data. More recently, she has focused on understanding (and improving) the temporal generalization of language models.
There has been a large body of work studying how agents can learn communication protocols in decentralized settings, using their actions to communicate information. Surprisingly little work has studied how this can be prevented, yet this is a crucial prerequisite from a human-AI coordination and AI-safety point of view.
The standard problem setting in Dec-POMDPs is self-play, where the goal is to find a set of policies that play optimally together. Policies learned through self-play may adopt arbitrary conventions and implicitly rely on multi-step reasoning based on fragile assumptions about other agents' actions and thus fail when paired with humans or independently trained agents at test time. To address this, we present off-belief learning (OBL). At each timestep OBL agents follow a policy pi_1 that is optimized assuming past actions were taken by a given, fixed policy, pi_0, but assuming that future actions will be taken by pi_1. When pi_0 is uniform random, OBL converges to an optimal policy that does not rely on inferences based on other agents' behavior.
OBL can be iterated in a hierarchy, where the optimal policy from one level becomes the input to the next, thereby introducing multi-level cognitive reasoning in a controlled manner. Unlike existing approaches, which may converge to any equilibrium policy, OBL converges to a unique policy, making it suitable for zero-shot coordination (ZSC).
OBL can be scaled to high-dimensional settings with a fictitious transition mechanism and shows strong performance in both a toy-setting and the benchmark human-AI & ZSC problem Hanabi.
Jakob Foerster started as an Associate Professor at the department of engineering science at the University of Oxford in the fall of 2021. During his PhD at Oxford he helped bring deep multi-agent reinforcement learning to the forefront of AI research and interned at Google Brain, OpenAI, and DeepMind. After his PhD he worked as a research scientist at Facebook AI Research in California, where he continued doing foundational work. He was the lead organizer of the first Emergent Communication workshop at NeurIPS in 2017, which he has helped organize ever since and was awarded a prestigious CIFAR AI chair in 2019.
His past work addresses how AI agents can learn to cooperate and communicate with other agents, most recently he has been developing and addressing the zero-shot coordination problem setting, a crucial step towards human-AI coordination.
His work has been cited over 5000 times, with an h-index of 29 (Google Scholar page).