University of Texas at Austin

Past Event: Oden Institute Seminar

Reinforcement learning: offline reliability and online safety

Thiago D. Simão, Radboud University, The Netherlands

10:45 – 12:15PM
Monday Apr 18, 2022

POB 6.304

Abstract

Safety is a paramount issue for applying reinforcement learning (RL) algorithms in the real world. We investigate safety from two perspectives, one where a reasonable performance must be ensured and another where safety constraints cannot be violated. In the first, the RL agent only has access to a fixed dataset of past trajectories and does not interact directly with the environment. Assuming the behavior policy that collected the data is available, the challenge is to compute a policy that outperforms the behavior policy. In this setting, we develop sample-efficient algorithms by exploiting the structure of the problem. In the second, the RL agent is trained by interacting directly with the environment. However, there are safety constraints. Therefore, the agent cannot perform the typical random exploration from online RL algorithms. Assuming the agent has access to an abstraction of the safety dynamics, we develop algorithms that can safely explore the environment and eventually converge to an optimal policy.

Biography

Thiago is a PostDoc researcher at Radboud University Nijmegen advised by Dr. Nils Jansen. Previously, he was a Ph.D. candidate within the Algorithmics Group at Delft University of Technology, advised by Dr. Matthijs Spaan. His research interests lie primarily in the automation of sequential decision-making, focusing on reinforcement learning and its safety aspects. He obtained his M.Sc. degree in artificial intelligence from the Instituto de Matemática e Estatística at Universidade de São Paulo under the supervision of Dr. Leliane N. de Barros and a bachelor degree in computer science at Universidade Federal de Lavras.

 

More information at https://tdsimao.github.io

Reinforcement learning: offline reliability and online safety

Event information

Date
10:45 – 12:15PM
Monday Apr 18, 2022
Location POB 6.304
Hosted by Ufuk Topcu