Towards Interpretable Deep Reinforcement Learning Models via Inverse Reinforcement Learning
Title:
Towards Interpretable Deep Reinforcement Learning Models via Inverse Reinforcement Learning
Link:
https://ieeexplore.ieee.org/document/9956245
Abstract:
Artificial Intelligence, particularly through recent advancements in deep learning (DL), has achieved exceptional performances in many tasks in fields such as natural language processing and computer vision. For certain high-stake domains, in addition to desirable performance metrics, a high level of interpretability is often required in order for AI to be reliably utilized. Unfortunately, the black box nature of DL models prevents researchers from providing explicative descriptions for a DL model’s reasoning process and decisions. In this work, we propose a novel framework utilizing Adversarial Inverse Reinforcement Learning that can provide global explanations for decisions made by a Reinforcement Learning model and capture intuitive tendencies that the model follows by summarizing the model’s decision-making process.
Citation:
Yuansheng Xie, Soroush Vosoughi, Saeed Hassanpour, “Towards Interpretable Deep Reinforcement Learning via Inverse Reinforcement Learning”, International Conference on Pattern Recognition (ICPR), Montreal, Quebec, Canada, 2022.