Date of Award

Spring 2025

Project Type

Dissertation

Program or Major

Computer Science

Degree Name

Doctor of Philosophy

First Advisor

Marek Petrik

Second Advisor

Marek Petrik

Third Advisor

Samuel Carton

Abstract

Reinforcement Learning (RL) is a core area of artificial intelligence (AI) that enables systems to make decisions in complex environments. RL algorithms have shown promise in various domains, including resource management, robotics, and games. However, most existing RL approaches fail to account for the risks associated with decision-making. This oversight becomes particularly critical in high-stakes settings, such as healthcare, finance, criminal justice, and autonomous driving, where poor decisions can have significant consequences.

To address risk in decision-making, another line of research explores the properties (axioms) of "desirable" risk measures. This desirability is defined as the ability to quantify the capital requirement for random losses. Monetary risk measures are widely accepted to serve as the basic requirement for the study of risk measures. In this work, we focus on four prominent monetary risk measures: Value at Risk (VaR), Conditional Value at Risk (CVaR), Entropic Value at Risk (EVaR), and the Entropic Risk Measure (ERM).

Despite various attempts to incorporate risk into RL, many existing methods focus on dynamic risks and distributional Markov risks, leading to algorithms that fail to optimize meaningful risk metrics, limiting their effectiveness in real-world scenarios. This thesis introduces novel algorithms that accurately optimize risks within the general RL framework of discounted Markov Decision Processes (MDPs). In particular, we prove specific dynamic programming principles for each risk measure and develop reliable risk-sensitive RL algorithms.

This thesis makes three key contributions: 1) We disprove the efficacy of a widely adopted approach that directly applies coherent risk decomposition to MDP policy optimization. 2) We propose two novel fully polynomial-time approximation scheme (FPTAS) algorithms to optimize entropic-based risk MDPs. 3) We propose practical VaR RL algorithms with performance bounds and convergence analysis.

These contributions are based on three peer-reviewed conference publications. By addressing both theoretical and practical challenges in risk-sensitive reinforcement learning, this work aims to develop fundamentally sound models and algorithms to better measure and handle uncertainty in high-risk domains.

The methods presented in this thesis have potential applications in AI, healthcare, finance, and autonomous vehicles. Experimental results demonstrate that our algorithms optimized the desired risk measures accurately and consistently outperform existing methods in optimizing risk in standard RL benchmarks, including inventory management, pest population control, gambler's ruin, cliff walking, machine replacement, and river swim environments.

Share

COinS