Date of Award
Spring 2022
Project Type
Dissertation
Program or Major
Computer Science
Degree Name
Doctor of Philosophy
First Advisor
Marek M Petrik
Second Advisor
Mohammad M Ghavamzadeh
Third Advisor
Wheeler W Ruml
Abstract
Applying the reinforcement learning methodology to domains that involve risky decisions like medicine or robotics requires high confidence in the performance of a policy before its deployment. Markov Decision Processes (MDPs) have served as a well-established model in reinforcement learning (RL). An MDP model assumes that the exact transitional probabilities and rewards are available. However, in most cases, these parameters are unknown and are typically estimated from data, which are inherently prone to errors. Consequently, due to such statistical errors, the resulting computed policy's actual performance is often different from the designer's expectation. In this context, practitioners can either be negligent and ignore parameter uncertainty during decision-making or be pessimistic by planning to be protected against the worst-case scenario. This dissertation focuses on a moderate mindset that strikes a balance between the two contradicting points of view. This objective is also known as the percentile criterion and can be modeled as risk-aversion to epistemic uncertainty. We propose several RL algorithms that efficiently compute reliable policies with limited data that notably improve the policies' performance and alleviate the computational complexity compared to standard risk-averse RL algorithms. Furthermore, we present a fast and robust feature selection method for linear value function approximation, a standard approach to solving reinforcement learning problems with large state spaces. Our experiments show that our technique is faster and more stable than alternative methods.
Recommended Citation
Behzadian, Bahram, "Efficient Data-Driven Robust Policies for Reinforcement Learning" (2022). Doctoral Dissertations. 2661.
https://scholars.unh.edu/dissertation/2661